y = β₀ + β₁x
Where:
β₀ is the intercept.
β₁ is the slope.
y = β₀ + β₁x + β₂x²
Where:
β₂ is the coefficient of the squared term.
The Curve:
The x² term introduces a curve into the relationship.
If β₂ is positive, the curve opens upward (like a U).
If β₂ is negative, the curve opens downward (like an inverted U).
# Descriptive statistics
Cleaned_TaMA_Data %>% skim(Population)
| Name | Piped data |
| Number of rows | 11 |
| Number of columns | 76 |
| _______________________ | |
| Column type frequency: | |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Population | 0 | 1 | 552181.8 | 97862.98 | 411000 | 475500 | 549000 | 627000 | 701000 | ▇▅▅▅▅ |
Cleaned_TaMA_Data %>% skim(IGF)
| Name | Piped data |
| Number of rows | 11 |
| Number of columns | 76 |
| _______________________ | |
| Column type frequency: | |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| IGF | 0 | 1 | 1799544 | 679366.3 | 945774.9 | 1348552 | 1608987 | 2213410 | 3215765 | ▇▇▅▅▂ |
# Histograms
ggplot(Cleaned_TaMA_Data, aes(x = Population)) +
geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
labs(title = "Distribution of Population", x = "Population", y = "Frequency") +
scale_x_continuous(labels = comma)
ggplot(Cleaned_TaMA_Data, aes(x = IGF)) +
geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
labs(title = "Distribution of IGF Revenue", x = "IGF Revenue", y = "Frequency") +
scale_x_continuous(labels = comma)
# Growth Rate (Percentage)
Cleaned_TaMA_Data <- Cleaned_TaMA_Data %>%
mutate(
Population_Growth_Rate = c(NA, diff(Population) / Population[-length(Population)] * 100),
IGF_Growth_Rate = c(NA, diff(IGF) / IGF[-length(IGF)] * 100)
)
# Plot of Trends
ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
geom_line(aes(y = Population)) +
geom_point(aes(y = Population), color = "dodgerblue") +
labs(title = "Population Trend", x = "Year", y = "Population") +
scale_y_continuous(labels = comma)
ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
geom_line(aes(y = IGF)) +
geom_point(aes(y = IGF), color = "dodgerblue") +
labs(title = "IGF Trend", x = "Year", y = "IGF") +
scale_y_continuous(labels = comma)
ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
geom_line(aes(y = Population, color = "Population")) +
geom_point(aes(y = Population, color = "Population")) +
geom_line(aes(y = IGF, color = "IGF")) +
geom_point(aes(y = IGF, color = "IGF")) +
labs(title = "Population vs. IGF Revenue", x = "Year", y = "Amount/Population", color = "Type") +
scale_y_continuous(labels = comma)
# Growth rate plots
ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
geom_line(aes(y = Population_Growth_Rate, color = "Population Growth")) +
geom_point(aes(y = Population_Growth_Rate, color = "Population Growth")) +
geom_line(aes(y = IGF_Growth_Rate, color = "IGF Growth")) +
geom_point(aes(y = IGF_Growth_Rate, color = "IGF Growth")) +
labs(title = "Population Growth vs. IGF Growth", x = "Year", y = "Growth Rate (%)", color = "Type") +
scale_y_continuous(labels = percent_format(scale = 1)) +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") # Add horizontal line at zero
The histograms show an uneven distribution of population and IGF revenue. The population had the highest around 450,000. The trends plots show clear that the trend of IGF Revenue ( which experienced significant changes) is in the same direction to the trend of Population( which stable rise).
mod1 <- lm(IGF ~ Population, data = Cleaned_TaMA_Data)
summary(mod1)
##
## Call:
## lm(formula = IGF ~ Population, data = Cleaned_TaMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -280063 -142033 -5919 109601 433935
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1845174.9832 401403.2226 -4.597 0.0013 **
## Population 6.6006 0.7168 9.209 0.00000708 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 221800 on 9 degrees of freedom
## Multiple R-squared: 0.9041, Adjusted R-squared: 0.8934
## F-statistic: 84.8 on 1 and 9 DF, p-value: 0.000007076
Cleaned_TaMA_Data %>%
ggplot(aes(x = Population, y = IGF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(x = "Population", y = "IGF Revenue (Ghana Cedis)", title = "Linear Relationship between Population and IGF Revenue") +
scale_y_continuous(labels = scales::comma)
# The Quadratic Term
Cleaned_TaMA_Data$Population_Squared <- Cleaned_TaMA_Data$Population^2
# Quadratic Regression
mod_quad <- lm(IGF ~ Population + Population_Squared, data = Cleaned_TaMA_Data)
summary(mod_quad)
##
## Call:
## lm(formula = IGF ~ Population + Population_Squared, data = Cleaned_TaMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -325625 -76598 -40625 129711 307495
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1172389.975557798 2698329.843956656 0.434 0.675
## Population -4.586448127 9.920722573 -0.462 0.656
## Population_Squared 0.000010075 0.000008912 1.131 0.291
##
## Residual standard error: 218500 on 8 degrees of freedom
## Multiple R-squared: 0.9173, Adjusted R-squared: 0.8966
## F-statistic: 44.35 on 2 and 8 DF, p-value: 0.00004685
ggplot(Cleaned_TaMA_Data, aes(x = Population, y = IGF)) +
geom_point() +
geom_smooth(method = "lm", formula = y ~ x + I(x^2), se = TRUE) + # Use formula for quadratic
labs(x = "Population", y = "IGF Revenue (Ghana Cedis)", title = "Quadratic Relationship between Population and IGF Revenue") +
scale_y_continuous(labels = comma)
Linear Regression:
Coefficients:
Intercept: -1845174.9832
Population: 6.6006 . For each unit increase in population, IGF is predicted to increase by approximately 6.60 Ghana Cedis.
P-values: Intercept: 0.0013 (significant)
Population: 0.00000708 (significant)
R-squared: Multiple R-squared: 0.9041
Adjusted R-squared:0.8934
Interpretation: The linear model shows a very strong and
statistically significant relationship between population and IGF
revenue.
Population explains as high as 90.41% of the variance in IGF.
Quadratic Regression:
Coefficients: Intercept: 1172389.975557
Population: -4.586448127
Population_Squared: 0.000010075
P-values: All coefficients are statistically insignificant (p > 0.01). But the overall model is statistically significant ( p-value = 0.00004685).
R-squared: Multiple R-squared: 0.9173
Adjusted R-squared: 0.8966
Interpretation: The quadratic model shows a strong and statistically significant relationship between population and IGF revenue. The insignificant quadratic terms confirm that the relationship is linear and not non-linear relationship.
The R-squared of 0.9173 indicates that the quadratic model explains 91.73% of the variance in IGF, a little improvement of the linear model but since it is linear the coefficients p-values are non-significant.
Based on the statistical significance of the coefficients, the linear model is preferable.
Transformations
# Transformed Model
lm(Ln_IGF ~ Ln_Pop, data = Cleaned_TaMA_Data) %>% summary()
##
## Call:
## lm(formula = Ln_IGF ~ Ln_Pop, data = Cleaned_TaMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.115868 -0.058994 -0.004073 0.063445 0.126499
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -12.704 2.140 -5.936 0.000219 ***
## Ln_Pop 2.048 0.162 12.638 0.000000495 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.09207 on 9 degrees of freedom
## Multiple R-squared: 0.9467, Adjusted R-squared: 0.9407
## F-statistic: 159.7 on 1 and 9 DF, p-value: 0.0000004948
# Scatter Plots (Transformed Data)
ggplot(Cleaned_TaMA_Data, aes(x = Ln_Pop, y = Ln_IGF)) +
geom_point() +
geom_smooth(method = "lm") +
labs(title = "Log(Population) vs. Log(IGF Revenue)", x = "Log(Population)", y = "Log(IGF Revenue)")
After the log transformation the log model showed a stronger improvement of the linear relationship than the simple linear model and the relationship now strongly significant (p-value: 0.0000004948 and R-squared: 0.9467 ). The log model provides the best fit among the models so far.
# Scatter Plot
ggplot(Cleaned_TaMA_Data, aes(x = Population, y = IGF)) +
geom_point() +
labs(title = "Population vs. IGF Revenue", x = "Population", y = "IGF Revenue")
# Residual
ggplot(data = data.frame(residuals = residuals(mod1), fitted = fitted(mod1)), aes(x = fitted, y = residuals)) +
geom_point() + # Added geom_point()
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(title = "Residuals vs. Fitted (Linear) ", x = "Fitted Values", y = "Residuals")
ggplot(data = data.frame(residuals = residuals(mod1)), aes(x = residuals)) +
geom_histogram(bins = 10, fill = "skyblue", color = "black") +
labs(title = "Histogram of Residuals(Linear)", x = "Residuals")
ggplot(data = data.frame(residuals = residuals(mod1)), aes(sample = residuals)) +
geom_point(stat = "qq") +
stat_qq_line() +
labs(title = "Q-Q Plot of Residuals")
# Residuals vs. Fitted Values
ggplot(data = data.frame(residuals = residuals(mod_quad), fitted = fitted(mod_quad)),
aes(x = fitted, y = residuals)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(title = "Residuals vs. Fitted (Quadratic Model)", x = "Fitted Values", y = "Residuals")
# Histogram of Residuals
ggplot(data = data.frame(residuals = residuals(mod_quad)), aes(x = residuals)) +
geom_histogram(bins = 10, fill = "skyblue", color = "black") +
labs(title = "Histogram of Residuals (Quadratic Model)", x = "Residuals")
# Q-Q Plot of Residuals
ggplot(data = data.frame(residuals = residuals(mod_quad)), aes(sample = residuals)) +
geom_point(stat = "qq") +
stat_qq_line() +
labs(title = "Q-Q Plot of Residuals (Quadratic Model)")
# Durbin-Watson Test (Autocorrelation)
dwtest(mod1)
##
## Durbin-Watson test
##
## data: mod1
## DW = 1.9465, p-value = 0.3137
## alternative hypothesis: true autocorrelation is greater than 0
dwtest(mod_quad)
##
## Durbin-Watson test
##
## data: mod_quad
## DW = 2.0658, p-value = 0.233
## alternative hypothesis: true autocorrelation is greater than 0
# Breusch-Pagan Test (Homoscedasticity)
bptest(mod1)
##
## studentized Breusch-Pagan test
##
## data: mod1
## BP = 6.7204, df = 1, p-value = 0.009532
bptest(mod_quad)
##
## studentized Breusch-Pagan test
##
## data: mod_quad
## BP = 9.3229, df = 2, p-value = 0.009453
# Variance Inflation Factor (VIF) - Multicollinearity
bptest(mod1)
##
## studentized Breusch-Pagan test
##
## data: mod1
## BP = 6.7204, df = 1, p-value = 0.009532
vif(mod_quad)
## Population Population_Squared
## 197.4834 197.4834
For the linear model all the assumptions are met for except Homoscedasticity but for the quadratic model the Homoscedasticity and Multicollinearity assumptions are not satisfied.
To address heteroscedasticity of the linear model we use Robust Standard Errors
#1.
# Robust standard errors using HC3 (a common method)
robust_se <- coeftest(mod1, vcov. = vcovHC(mod1, type = "HC3"))
print(robust_se)
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1845174.9832 574978.9583 -3.2091 0.0106740 *
## Population 6.6006 1.1645 5.6680 0.0003065 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Diagnostic Plots
# Residuals vs. Fitted
ggplot(data = data.frame(residuals = residuals(mod1), fitted = fitted(mod1)),
aes(x = fitted, y = residuals)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(title = "Residuals vs. Fitted (Linear Model)", x = "Fitted Values", y = "Residuals")
cat("
## Linear Model with Robust Standard Errors
", capture.output(print(robust_se)), "
The Breusch-Pagan test indicated heteroscedasticity (p < 0.05). To address this, robust standard errors (HC3) were used. The robust standard errors adjust for the non-constant variance of the residuals to provide more reliable estimates of the coefficients' standard errors and p-values.
The linear model shows a highly significant relationship between Population and IGF revenue, even when using robust standard errors. The R-squared value is ", summary(mod1)$r.squared,", indicating that ", round(summary(mod1)$r.squared * 100, 2), "% of the variance in IGF is explained by Population.
")
##
## ## Linear Model with Robust Standard Errors
##
## t test of coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -1845174.9832 574978.9583 -3.2091 0.0106740 * Population 6.6006 1.1645 5.6680 0.0003065 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## The Breusch-Pagan test indicated heteroscedasticity (p < 0.05). To address this, robust standard errors (HC3) were used. The robust standard errors adjust for the non-constant variance of the residuals to provide more reliable estimates of the coefficients' standard errors and p-values.
##
## The linear model shows a highly significant relationship between Population and IGF revenue, even when using robust standard errors. The R-squared value is 0.9040508 , indicating that 90.41 % of the variance in IGF is explained by Population.
Therefore from the analysis so far we found a strong and statistically significant positive linear relationship between population and IGF revenue. Population growth is a strong indicator of increased IGF revenue performace pattern. The assumptions are met.
Cleaned_TaMA_Data %>% skim(Population)
| Name | Piped data |
| Number of rows | 11 |
| Number of columns | 79 |
| _______________________ | |
| Column type frequency: | |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Population | 0 | 1 | 552181.8 | 97862.98 | 411000 | 475500 | 549000 | 627000 | 701000 | ▇▅▅▅▅ |
Cleaned_TaMA_Data %>% skim(DACF)
| Name | Piped data |
| Number of rows | 11 |
| Number of columns | 79 |
| _______________________ | |
| Column type frequency: | |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| DACF | 0 | 1 | 2939617 | 1388239 | 802346.2 | 1971853 | 3299111 | 3928373 | 4833171 | ▇▁▅▇▇ |
# Histograms
ggplot(Cleaned_TaMA_Data, aes(x = Population)) +
geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
labs(title = "Distribution of Population", x = "Population")
ggplot(Cleaned_TaMA_Data, aes(x = DACF)) +
geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
labs(title = "Distribution of DACF Revenue", x = "DACF Revenue")
#Growth Rates and Per Capita Values
Cleaned_TaMA_Data <- Cleaned_TaMA_Data %>%
mutate(
Population_Growth_Rate = c(NA, diff(Population) / Population[-length(Population)] * 100),
DACF_Growth_Rate = c(NA, diff(DACF) / DACF[-length(DACF)] * 100)
)
# Plotting Trends
ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
geom_line(aes(y = Population)) +
geom_point(aes(y = Population), color = "dodgerblue") +
labs(title = "Population Trend", x = "Year", y = "Population") +
scale_y_continuous(labels = comma)
ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
geom_line(aes(y = DACF)) +
geom_point(aes(y = DACF), color = "dodgerblue") +
labs(title = "DACF Trend", x = "Year", y = "IGF") +
scale_y_continuous(labels = comma)
ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
geom_line(aes(y = Population, color = "Population")) +
geom_point(aes(y = Population, color = "Population")) +
geom_line(aes(y = DACF, color = "DACF")) +
geom_point(aes(y = DACF, color = "DACF")) +
labs(title = "Population vs. DACF Revenue", x = "Year", y = "Amount/Population", color = "Type") +
scale_y_continuous(labels = scales::comma)
# Plotting Growth Rates
ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
geom_line(aes(y = Population_Growth_Rate, color = "Population Growth")) +
geom_point(aes(y = Population_Growth_Rate, color = "Population Growth")) +
geom_line(aes(y = DACF_Growth_Rate, color = "DACF Growth")) +
geom_point(aes(y = DACF_Growth_Rate, color = "DACF Growth")) +
labs(title = "Population Growth vs. DACF Growth", x = "Year", y = "Growth Rate (%)", color = "Type")+
geom_hline(yintercept = 0, linetype = "dashed", color = "red")
The histograms show an uneven distribution of population and DACF revenue. The trends plots show clear that the trend of DACF Revenue ( which experienced significant changes) moves in the same direction as the trend of Population( which had a stable rise).
mod2 <- lm(DACF ~ Population, data = Cleaned_TaMA_Data)
summary(mod2)
##
## Call:
## lm(formula = DACF ~ Population, data = Cleaned_TaMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -934903 -463153 -325069 190219 2052059
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3379580.055 1564725.131 -2.160 0.05908 .
## Population 11.444 2.794 4.096 0.00269 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 864700 on 9 degrees of freedom
## Multiple R-squared: 0.6508, Adjusted R-squared: 0.612
## F-statistic: 16.78 on 1 and 9 DF, p-value: 0.002694
Cleaned_TaMA_Data %>%
ggplot(aes(x = Population, y = DACF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) + # Added confidence intervals
labs(x = "Population", y = "DACF Revenue (Ghana Cedis)", title = "Linear Relationship between Population and DACF Revenue") +
scale_y_continuous(labels = scales::comma)
There is a statistically significant positive relationship between population and DACF revenue performance patterns. As population increases, DACF tends to increase. Population explains only 65.08% of the variance in DACF.
#Scatter Plot
ggplot(Cleaned_TaMA_Data, aes(x = Population, y = DACF)) +
geom_point() +
labs(title = "Population vs. DACF Revenue",
x = "Population", y = "DACF Revenue")
# Residual
ggplot(data = data.frame(residuals = residuals(mod2),
fitted = fitted(mod2)),
aes(x = fitted, y = residuals)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(title = "Residuals vs. Fitted",
x = "Fitted Values", y = "Residuals")
ggplot(data = data.frame(residuals = residuals(mod2)),
aes(x = residuals)) +
geom_histogram(bins = 10, fill = "skyblue", color = "black") +
labs(title = "Histogram of Residuals", x = "Residuals")
ggplot(data = data.frame(residuals = residuals(mod2)),
aes(sample = residuals)) +
stat_qq() +
stat_qq_line() +
labs(title = "Q-Q Plot of Residuals ")
shapiro.test(resid(mod2))
##
## Shapiro-Wilk normality test
##
## data: resid(mod2)
## W = 0.83717, p-value = 0.029
# Autocorrelation
dwtest(mod2)
##
## Durbin-Watson test
##
## data: mod2
## DW = 2.1657, p-value = 0.4609
## alternative hypothesis: true autocorrelation is greater than 0
# Homoscedasticity (Constant Variance of Residuals)
bptest(mod2)
##
## studentized Breusch-Pagan test
##
## data: mod2
## BP = 0.99812, df = 1, p-value = 0.3178
# Multicollinearity
#simple linear regression with one predictor(population), multicollinearity is not an issue.
# Multivariate Normality
#It is a simple linear regression with one predictor(population), multicollinearity therefore this is not an issue.
The scatter plot shows a positive but linear relationship. It shows that as population increases DACF revenue tends to increase as well. The histogram plot show a potential violation of the normality assumption and the test confirms it. The Durbin-Watson test revealed no autocorrelation, and the Breusch-Pagan test shows homoscedasticity.
#Transformed Models
log_mod2 <- lm(log(DACF) ~ log(Population), data = Cleaned_TaMA_Data)
summary(log_mod2 )
#
# Call:
# lm(formula = log(DACF) ~ log(Population), data = Cleaned_TaMA_Data)
#
# Residuals:
# Min 1Q Median 3Q Max
# -0.47510 -0.15097 -0.09072 0.09068 0.83725
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -24.0066 8.7533 -2.743 0.02275 *
# log(Population) 2.9340 0.6627 4.427 0.00165 **
# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
# Residual standard error: 0.3766 on 9 degrees of freedom
# Multiple R-squared: 0.6853, Adjusted R-squared: 0.6504
# F-statistic: 19.6 on 1 and 9 DF, p-value: 0.001654
sqrt_mod2 <- lm( sqrt(DACF)~sqrt(Population), data = Cleaned_TaMA_Data )
summary(sqrt_mod2)
#
# Call:
# lm(formula = sqrt(DACF) ~ sqrt(Population), data = Cleaned_TaMA_Data)
#
# Residuals:
# Min 1Q Median 3Q Max
# -328.74 -131.72 -67.78 52.94 640.34
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -2529.334 981.892 -2.576 0.02989 *
# sqrt(Population) 5.656 1.321 4.280 0.00205 **
# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
# Residual standard error: 276.3 on 9 degrees of freedom
# Multiple R-squared: 0.6706, Adjusted R-squared: 0.634
# F-statistic: 18.32 on 1 and 9 DF, p-value: 0.002049
# Scatter Plots (Transformed Data)
ggplot(Cleaned_TaMA_Data, aes(x = log(Population), y = log(DACF))) +
geom_point() +
geom_smooth(method = "lm")+
labs(title = "Log(Population) vs. Log(DACF Revenue)",
x = "Log(Population)", y = "Log(DACF Revenue)")
ggplot(Cleaned_TaMA_Data, aes(x = log(Population), y = log(DACF))) +
geom_point() +
geom_smooth(method = "lm")+
labs(title = "Sqrt(Population) vs. Sqrt(DACF Revenue)",
x = "Sqrt(Population)", y = "Sqrt(DACF Revenue)")
Both the log-log and square root transformations are statistically significant and have improved the model fit compared to the linear model. The log-log model is slightly better due to with higher R-squared.
# Function to perform diagnostic tests and plots
perform_diagnostics <- function(model, model_name) {
# Residuals vs. Fitted
plot1 <- ggplot(data = data.frame(residuals = residuals(model), fitted = fitted(model)),
aes(x = fitted, y = residuals)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(title = paste("Residuals vs. Fitted (", model_name, ")"), x = "Fitted Values", y = "Residuals")
# Histogram of Residuals
plot2 <- ggplot(data = data.frame(residuals = residuals(model)), aes(x = residuals)) +
geom_histogram(bins = 10, fill = "skyblue", color = "black") +
labs(title = paste("Histogram of Residuals (", model_name, ")"), x = "Residuals")
# Q-Q Plot of Residuals
plot3 <- ggplot(data = data.frame(residuals = residuals(model)), aes(sample = residuals)) +
geom_point(stat = "qq") +
stat_qq_line() +
labs(title = paste("Q-Q Plot of Residuals (", model_name, ")"))
# Durbin-Watson Test
dw_test <- dwtest(model)
print(paste("Durbin-Watson Test (", model_name, "):"))
print(dw_test)
# Breusch-Pagan Test
bp_test <- bptest(model)
print(paste("Breusch-Pagan Test (", model_name, "):"))
print(bp_test)
# Print VIF (if applicable)
if (length(coef(model)) > 2) { # Check for multiple predictors
vif_result <- vif(model)
print(paste("VIF (", model_name, "):"))
print(vif_result)
}
# Arrange plots
grid.arrange(plot1, plot2, plot3, nrow = 1)
}
# Perform diagnostics for each model
perform_diagnostics(mod2, "Linear Model")
## [1] "Durbin-Watson Test ( Linear Model ):"
##
## Durbin-Watson test
##
## data: model
## DW = 2.1657, p-value = 0.4609
## alternative hypothesis: true autocorrelation is greater than 0
##
## [1] "Breusch-Pagan Test ( Linear Model ):"
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 0.99812, df = 1, p-value = 0.3178
perform_diagnostics(log_mod2, "Log-Log Model")
## [1] "Durbin-Watson Test ( Log-Log Model ):"
##
## Durbin-Watson test
##
## data: model
## DW = 1.9478, p-value = 0.3146
## alternative hypothesis: true autocorrelation is greater than 0
##
## [1] "Breusch-Pagan Test ( Log-Log Model ):"
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 1.4224, df = 1, p-value = 0.233
perform_diagnostics(sqrt_mod2, "Square Root Model")
## [1] "Durbin-Watson Test ( Square Root Model ):"
##
## Durbin-Watson test
##
## data: model
## DW = 2.0621, p-value = 0.3893
## alternative hypothesis: true autocorrelation is greater than 0
##
## [1] "Breusch-Pagan Test ( Square Root Model ):"
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 1.2228, df = 1, p-value = 0.2688
shapiro.test(resid(mod2))
##
## Shapiro-Wilk normality test
##
## data: resid(mod2)
## W = 0.83717, p-value = 0.029
shapiro.test(resid(log_mod2))
##
## Shapiro-Wilk normality test
##
## data: resid(log_mod2)
## W = 0.88369, p-value = 0.1159
shapiro.test(resid(sqrt_mod2))
##
## Shapiro-Wilk normality test
##
## data: resid(sqrt_mod2)
## W = 0.86368, p-value = 0.06426
The diagnostic tests indicate that all the three models satisfy the assumptions of no autocorrelation and homoscedasticity. The Shapiro-Wilk normality tests confirm that the only linear model violates the normality assumption.
Therefore, from the regression analysis results all three models appear to be valid. The log-log model is the best with a higher R-squared and smaller residuals, it met all the assumption. It relationship between population and DACF revenue performance is better captured by the log model.
# Calculate descriptive statistics
desc_stats <- Cleaned_TaMA_Data %>%
summarize(
Population_mean = mean(Population),
Population_sd = sd(Population),
Population_min = min(Population),
Population_max = max(Population),
Capital_Expenditure_mean = mean(Capital_Expenditure),
Capital_Expenditure_sd = sd(Capital_Expenditure),
Capital_Expenditure_min = min(Capital_Expenditure),
Capital_Expenditure_max = max(Capital_Expenditure),
Recrrent_Expenditure_mean = mean(Recrrent_Expenditure),
Recrrent_Expenditure_sd = sd(Recrrent_Expenditure),
Recrrent_Expenditure_min = min(Recrrent_Expenditure),
Recrrent_Expenditure_max = max(Recrrent_Expenditure)
)
cat("
## Descriptive Statistics
| Statistic | Population | Capital Expenditure | Recurrent Expenditure |
|------------------------|------------|---------------------|-----------------------|
| Mean |", format(desc_stats$Population_mean, big.mark = ",", digits = 2),
"|", format(desc_stats$Capital_Expenditure_mean, big.mark = ",", digits = 2),
"|", format(desc_stats$Recrrent_Expenditure_mean, big.mark = ",", digits = 2), "|
| Standard Deviation |", format(desc_stats$Population_sd, big.mark = ",", digits = 2),
"|", format(desc_stats$Capital_Expenditure_sd, big.mark = ",", digits = 2),
"|", format(desc_stats$Recrrent_Expenditure_sd, big.mark = ",", digits = 2), "|
| Minimum |", format(desc_stats$Population_min, big.mark = ",", digits = 2),
"|", format(desc_stats$Capital_Expenditure_min, big.mark = ",", digits = 2),
"|", format(desc_stats$Recrrent_Expenditure_min, big.mark = ",", digits = 2), "|
| Maximum |", format(desc_stats$Population_max, big.mark = ",", digits = 2),
"|", format(desc_stats$Capital_Expenditure_max, big.mark = ",", digits = 2),
"|", format(desc_stats$Recrrent_Expenditure_max, big.mark = ",", digits = 2), "|
\n")
##
## ## Descriptive Statistics
##
## | Statistic | Population | Capital Expenditure | Recurrent Expenditure |
## |------------------------|------------|---------------------|-----------------------|
## | Mean | 552,182 | 6,065,533 | 2,681,829 |
## | Standard Deviation | 97,863 | 3,978,961 | 1,165,192 |
## | Minimum | 411,000 | 3,061,667 | 864,055 |
## | Maximum | 701,000 | 15,444,357 | 4,119,225 |
# Capital Expenditure Histogram
cap_hist <- ggplot(Cleaned_TaMA_Data, aes(x = Capital_Expenditure)) +
geom_histogram(aes(y = ..density..), bins = 10, fill = "skyblue", color = "black") +
geom_density(color = "red") +
labs(title = "Distribution of Capital Expenditure", x = "Capital Expenditure (Ghana Cedis)", y = "Density") +
scale_x_continuous(labels = comma)
# Recurrent Expenditure Histogram
rec_hist <- ggplot(Cleaned_TaMA_Data, aes(x = Recrrent_Expenditure)) +
geom_histogram(aes(y = ..density..), bins = 10, fill = "lightgreen", color = "black") +
geom_density(color = "red") +
labs(title = "Distribution of Recurrent Expenditure", x = "Recurrent Expenditure (Ghana Cedis)", y = "Density") +
scale_x_continuous(labels = comma)
# Population Histogram
pop_hist <- ggplot(Cleaned_TaMA_Data, aes(x = Population)) +
geom_histogram(aes(y = ..density..), bins = 10, fill = "dodgerblue", color = "black") +
geom_density(color = "red") +
labs(title = "Distribution of Population", x = "Population", y = "Density") +
scale_x_continuous(labels = comma)
cap_hist
rec_hist
pop_hist
ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
geom_line(aes(y = Population)) +
geom_point(aes(y = Population), color = "dodgerblue") +
labs(title = "Population Trend", x = "Year", y = "Population") +
scale_y_continuous(labels = comma)
ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
geom_line(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
geom_point(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
geom_line(aes(y = Recrrent_Expenditure, color = "Recurrent Expenditure")) +
geom_point(aes(y = Recrrent_Expenditure, color = "Recurrent Expenditure")) +
labs(title = " Expenditure Trends", x = "Year", y = "Amount", color = "Type") +
theme(axis.title.y.right = element_text(vjust=2))
ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
geom_line(aes(y = Population, color = "Population")) +
geom_point(aes(y = Population, color = "Population")) +
geom_line(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
geom_point(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
geom_line(aes(y = Recrrent_Expenditure, color = "Recurrent Expenditure")) +
geom_point(aes(y = Recrrent_Expenditure, color = "Recurrent Expenditure")) +
labs(title = "Population and Expenditure Trends", x = "Year", y = "Amount", color = "Type") +
scale_y_continuous(labels = comma, sec.axis = sec_axis(~., name = "Population")) +
theme(axis.title.y.right = element_text(vjust=2))
ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
geom_line(aes(y = Capital_Exp_Per_Capita, color = "Capital Exp. Per Capita")) +
geom_point(aes(y = Capital_Exp_Per_Capita, color = "Capital Exp. Per Capita")) +
geom_line(aes(y = Rec_Exp_Per_Capita, color = "Recurrent Exp. Per Capita")) +
geom_point(aes(y = Rec_Exp_Per_Capita, color = "Recurrent Exp. Per Capita")) +
labs(title = "Expenditure Per Capita Over Time", x = "Year", y = "Ghana Cedis Per Capita", color = "Type") +
scale_y_continuous(labels = comma)
# Calculate Per Capita Values
Cleaned_TaMA_Data$Capital_Exp_Per_Capita <- Cleaned_TaMA_Data$Capital_Expenditure / Cleaned_TaMA_Data$Population
# Plotting Trends (Improved)
ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
geom_line(aes(y = Population, color = "Population")) +
geom_point(aes(y = Population, color = "Population")) +
geom_line(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
geom_point(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
labs(title = "Population and Capital Expenditure Trends", x = "Year", y = "Amount", color = "Type") +
scale_y_continuous(labels = comma, sec.axis = sec_axis(~., name = "Population")) +
theme(axis.title.y.right = element_text(vjust=2))
# Per Capita Analysis
average_capita <- mean(Cleaned_TaMA_Data$Capital_Exp_Per_Capita)
ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
geom_line(aes(y = Capital_Exp_Per_Capita, color = "Capital Exp. Per Capita")) +
geom_point(aes(y = Capital_Exp_Per_Capita, color = "Capital Exp. Per Capita")) +
geom_hline(yintercept = average_capita, linetype = "dashed", color = "red")+
labs(title = "Capital Expenditure Per Capita Over Time", x = "Year", y = "Ghana Cedis Per Capita", color = "Type") +
scale_y_continuous(labels = comma)
Cleaned_TaMA_Data$Recrrent_Exp_Per_Capita <- Cleaned_TaMA_Data$Recrrent_Expenditure / Cleaned_TaMA_Data$Population
average_rec_capita <- mean(Cleaned_TaMA_Data$Recrrent_Exp_Per_Capita)
ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
geom_line(aes(y = Recrrent_Exp_Per_Capita, color = "Recurrent Exp. Per Capita")) +
geom_point(aes(y = Recrrent_Exp_Per_Capita, color = "Recrrent Exp. Per Capita")) +
geom_hline(yintercept = average_rec_capita, linetype = "dashed", color = "red") +
labs(title = "Recurrent Expenditure Per Capita Over Time", x = "Year", y = "Ghana Cedis Per Capita", color = "Type") +
scale_y_continuous(labels = comma)
ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
geom_line(aes(y = Population, color = "Population")) +
geom_point(aes(y = Population, color = "Population")) +
geom_line(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
geom_point(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
geom_line(aes(y = Recrrent_Expenditure, color = "Recurrent Expenditure")) +
geom_point(aes(y = Recrrent_Expenditure, color = "Recurrent Expenditure")) +
labs(title = "Population and Expenditure Trends", x = "Year", y = "Amount", color = "Type") +
scale_y_continuous(labels = comma, sec.axis = sec_axis(~., name = "Population")) +
theme(axis.title.y.right = element_text(vjust=2))
ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
geom_line(aes(y = Capital_Exp_Per_Capita, color = "Capital Exp. Per Capita")) +
geom_point(aes(y = Capital_Exp_Per_Capita, color = "Capital Exp. Per Capita")) +
geom_line(aes(y = Recrrent_Exp_Per_Capita, color = "Recurrent Exp. Per Capita")) +
geom_point(aes(y = Recrrent_Exp_Per_Capita, color = "Recurrent Exp. Per Capita")) +
labs(title = "Expenditure Per Capita Over Time", x = "Year", y = "Ghana Cedis Per Capita", color = "Type") +
scale_y_continuous(labels = comma)
mod3 <- lm(cbind(Capital_Expenditure, Recrrent_Expenditure) ~ Population, data = Cleaned_TaMA_Data)
summary(mod3)
## Response Capital_Expenditure :
##
## Call:
## lm(formula = Capital_Expenditure ~ Population, data = Cleaned_TaMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4708778 -1461251 -838424 -180257 8966045
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12733679.01 7247229.12 1.757 0.113
## Population -12.08 12.94 -0.933 0.375
##
## Residual standard error: 4005000 on 9 degrees of freedom
## Multiple R-squared: 0.08822, Adjusted R-squared: -0.01309
## F-statistic: 0.8708 on 1 and 9 DF, p-value: 0.3751
##
##
## Response Recrrent_Expenditure :
##
## Call:
## lm(formula = Recrrent_Expenditure ~ Population, data = Cleaned_TaMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -930985 -527287 105797 429788 1144183
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2747965.898 1253148.192 -2.193 0.05599 .
## Population 9.833 2.238 4.394 0.00173 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 692500 on 9 degrees of freedom
## Multiple R-squared: 0.6821, Adjusted R-squared: 0.6468
## F-statistic: 19.31 on 1 and 9 DF, p-value: 0.001735
mod_cap <- lm(Capital_Expenditure ~ Population, data = Cleaned_TaMA_Data)
summary(mod_cap)
##
## Call:
## lm(formula = Capital_Expenditure ~ Population, data = Cleaned_TaMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4708778 -1461251 -838424 -180257 8966045
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12733679.01 7247229.12 1.757 0.113
## Population -12.08 12.94 -0.933 0.375
##
## Residual standard error: 4005000 on 9 degrees of freedom
## Multiple R-squared: 0.08822, Adjusted R-squared: -0.01309
## F-statistic: 0.8708 on 1 and 9 DF, p-value: 0.3751
mod_rec <- lm(Recrrent_Expenditure ~ Population, data = Cleaned_TaMA_Data)
summary(mod_rec)
##
## Call:
## lm(formula = Recrrent_Expenditure ~ Population, data = Cleaned_TaMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -930985 -527287 105797 429788 1144183
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2747965.898 1253148.192 -2.193 0.05599 .
## Population 9.833 2.238 4.394 0.00173 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 692500 on 9 degrees of freedom
## Multiple R-squared: 0.6821, Adjusted R-squared: 0.6468
## F-statistic: 19.31 on 1 and 9 DF, p-value: 0.001735
Cleaned_TaMA_Data %>%
ggplot(aes(x = Population, y = Capital_Expenditure)) +
geom_point()+
geom_smooth(method = "lm", se = TRUE) + labs(x = "Population", y = "Capital Expenditure", title = "Linear Relationship Population and Capital Expenditure")+
scale_y_continuous(labels = scales::comma)
Cleaned_TaMA_Data %>%
ggplot(aes(x = Population, y = Recrrent_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(x = "Population", y = "Recurrent Expenditure", title = "Linear Relationship Population and Recurrent Expenditure") +
scale_y_continuous(labels = scales::comma)
From the linear regression results there is a significant positive linear relationship between Population and Recrrent_Expenditure(p-value: 0.001735, R-squared: 0.6821) but non-significant relationship between Population and Capital_Expenditure (p-value: 0.3751, R-squared: 0.08822). For each unit increase in Population, Recrrent_Expenditure is estimated to increase by 9.833 Ghana Cedis.
# Diagnostic Function
perform_diagnostics <- function(model, model_name) {
# Residuals vs. Fitted
plot1 <- ggplot(data = data.frame(residuals = residuals(model), fitted = fitted(model)),
aes(x = fitted, y = residuals)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(title = paste("Residuals vs. Fitted (", model_name, ")"), x = "Fitted Values", y = "Residuals")
# Histogram of Residuals
plot2 <- ggplot(data = data.frame(residuals = residuals(model)), aes(x = residuals)) +
geom_histogram(bins = 10, fill = "skyblue", color = "black") +
labs(title = paste("Histogram of Residuals (", model_name, ")"), x = "Residuals")
# Q-Q Plot of Residuals
plot3 <- ggplot(data = data.frame(residuals = residuals(model)), aes(sample = residuals)) +
geom_point(stat = "qq") +
stat_qq_line() +
labs(title = paste("Q-Q Plot of Residuals (", model_name, ")"))
# Durbin-Watson Test
dw_test <- dwtest(model)
print(paste("Durbin-Watson Test (", model_name, "):"))
print(dw_test)
# Breusch-Pagan Test
bp_test <- bptest(model)
print(paste("Breusch-Pagan Test (", model_name, "):"))
print(bp_test)
# Print VIF (if applicable)
if (length(coef(model)) > 2) { # Check for multiple predictors
vif_result <- vif(model)
print(paste("VIF (", model_name, "):"))
print(vif_result)
}
# Arrange plots
grid.arrange(plot1, plot2, plot3, nrow = 1)
}
# Perform Diagnostics
# Capital Expenditure
perform_diagnostics(mod_cap, "Capital Expenditure Model")
## [1] "Durbin-Watson Test ( Capital Expenditure Model ):"
##
## Durbin-Watson test
##
## data: model
## DW = 0.93098, p-value = 0.005943
## alternative hypothesis: true autocorrelation is greater than 0
##
## [1] "Breusch-Pagan Test ( Capital Expenditure Model ):"
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 1.3067, df = 1, p-value = 0.253
# Recurrent Expenditure
perform_diagnostics(mod_rec, "Recurrent Expenditure Model")
## [1] "Durbin-Watson Test ( Recurrent Expenditure Model ):"
##
## Durbin-Watson test
##
## data: model
## DW = 1.8214, p-value = 0.2394
## alternative hypothesis: true autocorrelation is greater than 0
##
## [1] "Breusch-Pagan Test ( Recurrent Expenditure Model ):"
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 0.17041, df = 1, p-value = 0.6797
From the above tests the Recurrent Expenditure Model met all all assumptions but the capital expenditure violates the autocorrelation regression assumption.
# Log Transformation for Recurrent Expenditure
log_rec_mod <- lm(log(Recrrent_Expenditure) ~ Population, data = Cleaned_TaMA_Data)
summary(log_rec_mod)
##
## Call:
## lm(formula = log(Recrrent_Expenditure) ~ Population, data = Cleaned_TaMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.66977 -0.18757 0.06111 0.23307 0.41234
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.492400159 0.594054473 21.029 0.00000000583 ***
## Population 0.000003997 0.000001061 3.768 0.00443 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3283 on 9 degrees of freedom
## Multiple R-squared: 0.6121, Adjusted R-squared: 0.569
## F-statistic: 14.2 on 1 and 9 DF, p-value: 0.004429
perform_diagnostics(log_rec_mod, "Log Recurrent Expenditure Model")
## [1] "Durbin-Watson Test ( Log Recurrent Expenditure Model ):"
##
## Durbin-Watson test
##
## data: model
## DW = 2.1873, p-value = 0.4761
## alternative hypothesis: true autocorrelation is greater than 0
##
## [1] "Breusch-Pagan Test ( Log Recurrent Expenditure Model ):"
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 1.473, df = 1, p-value = 0.2249
log_cap_mod <- lm(log(Capital_Expenditure) ~ Population, data = Cleaned_TaMA_Data)
summary(log_cap_mod)
##
## Call:
## lm(formula = log(Capital_Expenditure) ~ Population, data = Cleaned_TaMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.79149 -0.24915 -0.03883 0.12082 1.02096
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 16.471755961 0.961751754 17.127 0.0000000355 ***
## Population -0.000001815 0.000001717 -1.057 0.318
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5315 on 9 degrees of freedom
## Multiple R-squared: 0.1104, Adjusted R-squared: 0.01151
## F-statistic: 1.116 on 1 and 9 DF, p-value: 0.3182
perform_diagnostics(log_cap_mod, "Log capital Expenditure Model")
## [1] "Durbin-Watson Test ( Log capital Expenditure Model ):"
##
## Durbin-Watson test
##
## data: model
## DW = 0.68088, p-value = 0.0006221
## alternative hypothesis: true autocorrelation is greater than 0
##
## [1] "Breusch-Pagan Test ( Log capital Expenditure Model ):"
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 2.448, df = 1, p-value = 0.1177
Cleaned_TaMA_Data$Ln_Population <- log(Cleaned_TaMA_Data$Population)
Cleaned_TaMA_Data$Ln_Capital_Expenditure <- log(Cleaned_TaMA_Data$Capital_Expenditure)
ggplot(Cleaned_TaMA_Data, aes(x = log(Population), y = log(Capital_Expenditure))) +
geom_point() +
geom_smooth(method = "lm", se = TRUE)+
labs(title = "Log(Population) vs. Log(Capital Expenditure)",
x = "Log(Population)", y = "Log(Capital Expenditure)")
ggplot(Cleaned_TaMA_Data, aes(x = log(Population), y = log(Recrrent_Expenditure))) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Log(Population) vs. Log(Recurrent Expenditure)",
x = "Log(Population)", y = "Log(Recurrent Expenditure)")
# Square root transformation for Capital Expenditure
sqrt_cap_mod <- lm(sqrt(Capital_Expenditure) ~ Population, data = Cleaned_TaMA_Data)
summary(sqrt_cap_mod)
##
## Call:
## lm(formula = sqrt(Capital_Expenditure) ~ Population, data = Cleaned_TaMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -942.65 -289.22 -94.60 58.88 1483.28
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3636.405029 1283.303572 2.834 0.0196 *
## Population -0.002297 0.002292 -1.002 0.3424
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 709.2 on 9 degrees of freedom
## Multiple R-squared: 0.1004, Adjusted R-squared: 0.0004577
## F-statistic: 1.005 on 1 and 9 DF, p-value: 0.3424
perform_diagnostics(sqrt_cap_mod, "Square root Capital Expenditure Model")
## [1] "Durbin-Watson Test ( Square root Capital Expenditure Model ):"
##
## Durbin-Watson test
##
## data: model
## DW = 0.79241, p-value = 0.001942
## alternative hypothesis: true autocorrelation is greater than 0
##
## [1] "Breusch-Pagan Test ( Square root Capital Expenditure Model ):"
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 1.8114, df = 1, p-value = 0.1783
From the transformations the recurrent expenditure model are still significant and met the assumptions but the capital expenditure have not.
Cleaned_TaMA_Data$Recrrent_Expenditure_squared <- Cleaned_TaMA_Data$Recrrent_Expenditure^2
Cleaned_TaMA_Data$Capital_Expenditure_squared <- Cleaned_TaMA_Data$Capital_Expenditure^2
mod_quad <- lm(cbind(Capital_Expenditure, Recrrent_Expenditure) ~ Population + Population_Squared, data = Cleaned_TaMA_Data)
# View the summary
summary(mod_quad)
## Response Capital_Expenditure :
##
## Call:
## lm(formula = Capital_Expenditure ~ Population + Population_Squared,
## data = Cleaned_TaMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3090996 -1920285 -1038866 1214987 6900062
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -71641592.0657378 42931967.5280382 -1.669 0.1337
## Population 300.7286485 157.8443571 1.905 0.0932 .
## Population_Squared -0.0002817 0.0001418 -1.987 0.0822 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3476000 on 8 degrees of freedom
## Multiple R-squared: 0.3895, Adjusted R-squared: 0.2368
## F-statistic: 2.552 on 2 and 8 DF, p-value: 0.139
##
##
## Response Recrrent_Expenditure :
##
## Call:
## lm(formula = Recrrent_Expenditure ~ Population + Population_Squared,
## data = Cleaned_TaMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -930947 -530205 110649 419965 1153130
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2412849.139164112 9071144.591820087 -0.266 0.797
## Population 8.590966910 33.351114993 0.258 0.803
## Population_Squared 0.000001119 0.000029960 0.037 0.971
##
## Residual standard error: 734500 on 8 degrees of freedom
## Multiple R-squared: 0.6822, Adjusted R-squared: 0.6027
## F-statistic: 8.585 on 2 and 8 DF, p-value: 0.01021
# Scatter Plots (Transformed Data)
ggplot(Cleaned_TaMA_Data, aes(x = Population, y = Capital_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", formula = y ~ x + I(x^2), se = TRUE) +
labs(x = "Population", y = "Capital Expenditure (Ghana Cedis)", title = "Quadratic Relationship between Population and Capital Expenditure") +
scale_y_continuous(labels = comma)
ggplot(Cleaned_TaMA_Data, aes(x = Population, y = Recrrent_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", formula = y ~ x + I(x^2), se = TRUE) +
labs(x = "Population", y = "Recurrent Expenditure (Ghana Cedis)", title = "Quadratic Relationship between Population and Recurrent Expenditure") +
scale_y_continuous(labels = comma)
Quadratic models show improvement of the relationship between population and capital expenditure but the relationship is still non-significant.
There from the regression analysis above the relationship between population and recurrent expenditure is positive linear and significant but the capital expenditure is non-linear and non-significant. The simple linear regression is the best fit model for recurrent expenditure and population.
Using total revenue growth rate and infrastructure delivery (capital expenditure per capita).
# Descriptive statistics
Cleaned_TaMA_Data %>% skim(Capital_Exp_Per_Capita)
| Name | Piped data |
| Number of rows | 11 |
| Number of columns | 85 |
| _______________________ | |
| Column type frequency: | |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Capital_Exp_Per_Capita | 0 | 1 | 11.56 | 8.18 | 4.6 | 6.55 | 8.36 | 12.8 | 29.82 | ▇▂▁▁▁ |
Cleaned_TaMA_Data %>% skim(TtRev_Growth_Rate)
| Name | Piped data |
| Number of rows | 11 |
| Number of columns | 85 |
| _______________________ | |
| Column type frequency: | |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| TtRev_Growth_Rate | 1 | 0.91 | 6.83 | 23.53 | -42.14 | 4.41 | 8.92 | 16.84 | 39.13 | ▂▂▂▇▃ |
# Histograms
ggplot(Cleaned_TaMA_Data, aes(x = Capital_Exp_Per_Capita)) +
geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
labs(title = "Distribution of Capital expenditure per capita", x = "Capital expenditure per capita") +
scale_x_continuous(labels = comma)
ggplot(Cleaned_TaMA_Data, aes(x = TtRev_Growth_Rate)) +
geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
labs(title = "Distribution of Total Revenue Growth Rate", x = "Total revenue growth rate") +
scale_x_continuous(labels = percent)
# Plotting Trends
ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
geom_line(aes(y = TtRev_Growth_Rate, color = "Total Revenue Growth Rate")) +
geom_point(aes(y = TtRev_Growth_Rate, color = "Total Revenue Growth Rate")) +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
geom_line(aes(y = Capital_Exp_Per_Capita, color = "Capital Expenditure Per Capita")) +
geom_point(aes(y = Capital_Exp_Per_Capita, color = "Capital Expenditure Per Capita")) +
labs(
title = "Total Revenue Growth Rate vs. Capital Expenditure Per Capita",
x = "Year",
y = "Total Revenue Growth Rate (%)"
) +
scale_y_continuous(
labels = percent_format(scale = 1),
sec.axis = sec_axis(~., name = "Capital Expenditure Per Capita")
) +
scale_color_manual(
values = c("Total Revenue Growth Rate" = "lightseagreen", "Capital Expenditure Per Capita" = "indianred"),
name = "Type"
) +
theme(axis.title.y.right = element_text(vjust = 2))
The histograms show an uneven distribution of Capital expenditure per capita.The trends plots show clear that the trend of Total revenue growth rate ( which experienced significant changes) is not directly linked to the trend of Capital expenditure per capita( which remained stable).
mod5 <- lm(Capital_Exp_Per_Capita ~ TtRev_Growth_Rate, data = Cleaned_TaMA_Data)
summary(mod5)
##
## Call:
## lm(formula = Capital_Exp_Per_Capita ~ TtRev_Growth_Rate, data = Cleaned_TaMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.612 -5.977 -2.603 2.961 17.699
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.33810 2.88082 3.936 0.00432 **
## TtRev_Growth_Rate 0.09282 0.12342 0.752 0.47353
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.712 on 8 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.06604, Adjusted R-squared: -0.05071
## F-statistic: 0.5657 on 1 and 8 DF, p-value: 0.4735
ggplot(Cleaned_TaMA_Data, aes(x = TtRev_Growth_Rate, y = Capital_Exp_Per_Capita)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE)+
labs(title = "Revenue Growth vs. Capital Expenditure (Per Capita)",
x = "Total Revenue Growth Rate (%)",
y = "Capital Expenditure Per Capita")
The regression result show there no statistically significant relationship between total revenue growth rate and infrastructure delivery (capital expenditure per capita) with p-value (0.3194) is greater than 0.05 significance level. This means that changes in revenue growth do not significantly predict changes in capital expenditure per capita in this model. The R-squared (0.1235) indicates only 12.35% of the variation in capital expenditure per capita can be explained by revenue growth (total revenue growth rate)
Cleaned_TaMA_Data$Expenditure_Growth <- c(NA, diff(Cleaned_TaMA_Data$Total_Expenditure) / Cleaned_TaMA_Data$Total_Expenditure[-nrow(Cleaned_TaMA_Data)]) * 100
mod6 <- lm(Capital_Exp_Per_Capita ~ Expenditure_Growth, data = Cleaned_TaMA_Data)
summary(mod6)
##
## Call:
## lm(formula = Capital_Exp_Per_Capita ~ Expenditure_Growth, data = Cleaned_TaMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.5176 -4.3263 -2.9899 0.8945 16.4278
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.88510 2.52264 3.919 0.00443 **
## Expenditure_Growth 0.14099 0.06878 2.050 0.07452 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.299 on 8 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.3444, Adjusted R-squared: 0.2624
## F-statistic: 4.202 on 1 and 8 DF, p-value: 0.07452
ggplot(Cleaned_TaMA_Data, aes(x = Expenditure_Growth, y = Capital_Exp_Per_Capita)) +
geom_point() + geom_smooth(method = "lm", se = TRUE)+
labs(title = "Expenditure Growth vs. Capital Expenditure (Per Capita)",
x = "Expenditure Growth Rate (%)",
y = "Capital Expenditure Per Capita")
From the linear regression results there is no statistically significant relationship.
# no variables
# Expenditure Composition:
Cleaned_TaMA_Data$CapExp_Pct <- (Cleaned_TaMA_Data$Capital_Expenditure / Cleaned_TaMA_Data$Total_Expenditure)
Cleaned_TaMA_Data$CapExp_Rev_Ratio <- (Cleaned_TaMA_Data$Capital_Expenditure / Cleaned_TaMA_Data$Total_Revenue)
# Expenditure Composition
ggplot(Cleaned_TaMA_Data, aes(x = Year, y = CapExp_Pct)) +
geom_bar(stat = "identity", fill = "dodgerblue") +
geom_point()+
labs(title = "Capital Expenditure as Percentage of Total Expenditure",
x = "Year",
y = "Percentage") +
scale_y_continuous(labels = percent_format(accuracy = 1))
# Trends of Revenue and Expenditure over the years.
ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
geom_line(aes(y = Total_Revenue, color = "Total Revenue")) +
geom_point(aes(y = Total_Revenue)) + # Added aes(y = Total_Revenue)
geom_line(aes(y = Total_Expenditure, color = "Total Expenditure")) +
geom_point(aes(y = Total_Expenditure)) + # Added aes(y = Total_Expenditure)
labs(title = "Revenue and Expenditure Trends Over Years",
x = "Year",
y = "Amount (Ghana Cedis)", color = "Type") +
scale_color_manual(values = c("Total Revenue" = "blue", "Total Expenditure" = "red")) +
scale_y_continuous(labels = comma)
ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
geom_line(aes(y = Total_Revenue, color = "Total Revenue"), size = 1) +
geom_line(aes(y = IGF, color = "IGF"), size = 1) +
geom_line(aes(y = DACF, color = "DACF"), size = 1) +
geom_line(aes(y = Capital_Expenditure, color = "Capital Expenditure"), size = 1) +
geom_line(aes(y = Recrrent_Expenditure , color = "Recurrent Expenditure"), size = 1) +
geom_line(aes(y = Total_Expenditure, color = "Total Expenditure"), size = 1) +
geom_line(aes(y = Others_Sources, color = "Other Sources"), size = 1) +
labs(
title = "Revenue and Expenditure Trends",
x = "Year",
y = "Amount (Ghana Cedis)",
color = "Type"
) +
scale_color_manual(
values = c(
"Total Revenue" = "blue",
"Other Sources" = "skyblue",
"IGF" = "green",
"DACF" = "darkgray",
"Capital Expenditure" = "purple",
"Total Expenditure" = "red",
"Recurrent Expenditure" = "yellow"
)
) +
scale_y_continuous(labels = scales::comma) +
theme(
legend.position = "right",
legend.title = element_text(face = "bold"),
plot.title = element_text(hjust = 0.5, face = "bold")
)
# IGF to Total Expenditure Ratio
ggplot(Cleaned_TaMA_Data, aes(x = Year, y = IGF_TE)) +
geom_line(color = "steelblue", size = 1) +
geom_point(size = 2.5) +
labs(
title = "IGF to Total Expenditure Ratio Over Years",
x = "Year",
y = "Ratio (IGF/Total Expenditure)"
) +
scale_y_continuous(labels = percent_format(accuracy = 1))
# CapExp_Rev_Ratio plot.
ggplot(Cleaned_TaMA_Data, aes(x = Year, y = CapExp_Rev_Ratio)) +
geom_line(color = "steelblue", size = 1) +
geom_point(size = 2.5) +
labs(
title = "Capital Expenditure to Total Revenue Ratio Over Years",
x = "Year",
y = "Ratio (Capital Expenditure/Total Revenue)"
) +
scale_y_continuous(labels = comma)
cor.test(Cleaned_TaMA_Data$Total_Expenditure, Cleaned_TaMA_Data$Total_Revenue)
##
## Pearson's product-moment correlation
##
## data: Cleaned_TaMA_Data$Total_Expenditure and Cleaned_TaMA_Data$Total_Revenue
## t = 9.1392, df = 9, p-value = 0.00000753
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.8144349 0.9872874
## sample estimates:
## cor
## 0.9501201
In the above plots, the Capital Expenditure as Percentage of Total Expenditure shows a slightly high capital investment with peak around 2014, followed by a sustained decline after 2016. Also, there is strong correlation between Total Revenue and Total Expenditure, with both peaking around 2016 and fall afterwards.
# Revenue Per Capita
Cleaned_TaMA_Data$Total_Revenue_Per_Capita <- Cleaned_TaMA_Data$Total_Revenue / Cleaned_TaMA_Data$Population
Cleaned_TaMA_Data$IGF_Per_Capita <- Cleaned_TaMA_Data$IGF / Cleaned_TaMA_Data$Population
Cleaned_TaMA_Data$DACF_Per_Capita <- Cleaned_TaMA_Data$DACF / Cleaned_TaMA_Data$Population
# Time Series Plots (Improved)
# Total Revenue and Expenditure Trends
ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
geom_line(aes(y = Total_Revenue, color = "Total Revenue"), size = 1) +
geom_point(aes(y = Total_Revenue, color = "Total Revenue")) +
geom_line(aes(y = IGF, color = "IGF"), size = 1) +
geom_point(aes(y = IGF, color = "IGF")) +
geom_line(aes(y = DACF, color = "DACF"), size = 1) +
geom_point(aes(y = DACF, color = "DACF")) +
geom_line(aes(y = Capital_Expenditure, color = "Capital Expenditure"), size = 1) +
geom_line(aes(y = Recrrent_Expenditure , color = "Recurrent Expenditure"), size = 1) +
geom_point(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
geom_line(aes(y = Total_Expenditure, color = "Total Expenditure"), size = 1) +
geom_point(aes(y = Total_Expenditure, color = "Total Expenditure")) +
geom_line(aes(y = Others_Sources, color = "Other Sources"), size = 1) +
geom_point(aes(y = Others_Sources, color = "Other Sources")) +
labs(
title = "Revenue and Expenditure Trends Over Years",
x = "Year",
y = "Amount (Ghana Cedis)",
color = "Type"
) +
scale_color_manual(
values = c(
"Total Revenue" = "blue",
"Other Sources" = "skyblue",
"IGF" = "green",
"DACF" = "darkgray",
"Capital Expenditure" = "purple",
"Total Expenditure" = "red",
"Recurrent Expenditure" = "yellow"
)
) +
scale_y_continuous(labels = comma) +
theme(
legend.position = "right",
legend.title = element_text(face = "bold"),
plot.title = element_text(hjust = 0.5, face = "bold")
)
# Population Trend
ggplot(Cleaned_TaMA_Data, aes(x = Year, y = Population)) +
geom_line(color = "steelblue", size = 1) +
geom_point(size = 2.5) +
labs(
title = "Population Trend Over Years",
x = "Year",
y = "Population"
) +
scale_y_continuous(labels = comma) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.title = element_text(face = "bold")
)
# IGF to Total Expenditure Ratio
ggplot(Cleaned_TaMA_Data, aes(x = Year, y = IGF_TE)) +
geom_line(color = "steelblue", size = 1) +
geom_point(size = 2.5) +
labs(
title = "IGF to Total Expenditure Ratio Over Years",
x = "Year",
y = "Ratio (IGF/Total Expenditure)"
) +
scale_y_continuous(labels = percent_format(accuracy = 1)) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.title = element_text(face = "bold")
)
# Per capita plot
ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
geom_line(aes(y = Total_Revenue_Per_Capita, color = "Total Revenue Per Capita")) +
geom_point(aes(y = Total_Revenue_Per_Capita, color = "Total Revenue Per Capita")) +
geom_line(aes(y = IGF_Per_Capita, color = "IGF Per Capita")) +
geom_point(aes(y = IGF_Per_Capita, color = "IGF Per Capita")) +
geom_line(aes(y = DACF_Per_Capita, color = "DACF Per Capita")) +
geom_point(aes(y = DACF_Per_Capita, color = "DACF Per Capita")) +
labs(title = "Revenue Per Capita trends", x = "Year", y = "Amount (Ghana Cedis)", color = "Type") +
scale_y_continuous(labels = comma)
cor_matrix <- cor(Cleaned_TaMA_Data[, c("Population", "Total_Revenue", "Total_Expenditure", "IGF_TE", "CapExp_Pct", "IGF")], use = "complete.obs")
print(cor_matrix)
## Population Total_Revenue Total_Expenditure IGF_TE
## Population 1.0000000 0.7194963 0.6545468 0.2825390
## Total_Revenue 0.7194963 1.0000000 0.9501201 -0.2968282
## Total_Expenditure 0.6545468 0.9501201 1.0000000 -0.4695060
## IGF_TE 0.2825390 -0.2968282 -0.4695060 1.0000000
## CapExp_Pct -0.8345614 -0.3341015 -0.2761764 -0.5350044
## IGF 0.9508158 0.7059292 0.5921264 0.4213178
## CapExp_Pct IGF
## Population -0.8345614 0.9508158
## Total_Revenue -0.3341015 0.7059292
## Total_Expenditure -0.2761764 0.5921264
## IGF_TE -0.5350044 0.4213178
## CapExp_Pct 1.0000000 -0.7987978
## IGF -0.7987978 1.0000000
corrplot(cor_matrix, main = "Correlation matrix of population and expenditure patterns")
In the above there is a strong positive correlation between total revenue and total expenditure and also between IGF.
# Total Revenue vs Population
model_revenue_pop <- lm(Total_Revenue ~ Population, data = Cleaned_TaMA_Data)
summary(model_revenue_pop)
##
## Call:
## lm(formula = Total_Revenue ~ Population, data = Cleaned_TaMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3868956 -1745651 -1182929 1274374 5837028
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3297051.78 6117612.55 -0.539 0.6030
## Population 33.95 10.92 3.108 0.0126 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3381000 on 9 degrees of freedom
## Multiple R-squared: 0.5177, Adjusted R-squared: 0.4641
## F-statistic: 9.66 on 1 and 9 DF, p-value: 0.01256
# Total Expenditure vs Population
model_expenditure_pop <- lm(Total_Expenditure ~ Population, data = Cleaned_TaMA_Data)
summary(model_expenditure_pop)
##
## Call:
## lm(formula = Total_Expenditure ~ Population, data = Cleaned_TaMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3711913 -2764234 -512930 318938 8909224
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4000822.24 7428256.85 -0.539 0.6032
## Population 34.45 13.26 2.597 0.0289 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4105000 on 9 degrees of freedom
## Multiple R-squared: 0.4284, Adjusted R-squared: 0.3649
## F-statistic: 6.746 on 1 and 9 DF, p-value: 0.02886
# Capital Expenditure vs Total Revenue and IGF_TE
model_capital_rev_igf <- lm(Capital_Expenditure ~ Total_Revenue + IGF_TE, data = Cleaned_TaMA_Data)
summary(model_capital_rev_igf)
##
## Call:
## lm(formula = Capital_Expenditure ~ Total_Revenue + IGF_TE, data = Cleaned_TaMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2776838 -1487355 16342 1565100 2945904
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17265329.1768 4379246.4356 3.943 0.00428 **
## Total_Revenue 0.1121 0.1589 0.706 0.50040
## IGF_TE -103977254.9573 23393596.4180 -4.445 0.00215 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2216000 on 8 degrees of freedom
## Multiple R-squared: 0.752, Adjusted R-squared: 0.6899
## F-statistic: 12.13 on 2 and 8 DF, p-value: 0.003786
# IGF_TE vs Population and Total Revenue
model_igfte_pop_rev <- lm(IGF_TE ~ Population + Total_Revenue, data = Cleaned_TaMA_Data)
summary(model_igfte_pop_rev)
##
## Call:
## lm(formula = IGF_TE ~ Population + Total_Revenue, data = Cleaned_TaMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.02625 -0.01534 -0.00533 0.01590 0.03440
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.051157749331 0.040855907353 1.252 0.2459
## Population 0.000000329638 0.000000103393 3.188 0.0128 *
## Total_Revenue -0.000000007042 0.000000002191 -3.214 0.0124 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.02222 on 8 degrees of freedom
## Multiple R-squared: 0.5984, Adjusted R-squared: 0.498
## F-statistic: 5.96 on 2 and 8 DF, p-value: 0.02602
# Visualizations
# Scatter plot: Total Revenue vs Population
ggplot(Cleaned_TaMA_Data, aes(x = Population, y = Total_Revenue)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Total Revenue vs Population", x = "Population", y = "Total Revenue") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
# Scatter plot: Total Expenditure vs Population
ggplot(Cleaned_TaMA_Data, aes(x = Population, y = Total_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Total Expenditure vs Population", x = "Population", y = "Total Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
# Scatter plot: Capital Expenditure vs Total Revenue
ggplot(Cleaned_TaMA_Data, aes(x = Total_Revenue, y = Capital_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Capital Expenditure vs Total Revenue", x = "Total Revenue", y = "Capital Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
# Scatter plot: IGF_TE vs Population
ggplot(Cleaned_TaMA_Data, aes(x = Population, y = IGF_TE)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "IGF_TE vs Population", x = "Population", y = "IGF_TE") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = percent_format(accuracy = 1))
ggplot(Cleaned_TaMA_Data, aes(x = Total_Revenue, y = IGF_TE)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "IGF_TE vs Total Revenue", x = "Total Revenue", y = "IGF_TE") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = percent_format(accuracy = 1))
In the regression results above, we found a significant linear relationship between between Total Revenue and Population, Total Expenditure and Population, and Capital Expenditure, Total Revenue, and between IGF_TE vs Population and Total Revenue.
# no variables
# IGF Trend
ggplot(Cleaned_TaMA_Data, aes(x = Year, y = IGF)) +
geom_line(color = "blue", size = 1) +
geom_point(size = 2.5) +
labs(
title = "IGF Trend Over Years",
x = "Year",
y = "IGF (Ghana Cedis)"
) +
scale_y_continuous(labels = comma) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.title = element_text(face = "bold")
)
# Land-Based Revenue Trends
ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
geom_line(aes(y = Act_Permit, color = "Permit Fees"), size = 1) +
geom_point(aes(y = Act_Permit, color = "Permit Fees")) +
geom_line(aes(y = Act_Property_Rates, color = "Property Rates"), size = 1) +
geom_point(aes(y = Act_Property_Rates, color = "Property Rates")) +
geom_line(aes(y = Act_Stool_Lands, color = "Stool Lands Revenue"), size = 1) +
geom_point(aes(y = Act_Stool_Lands, color = "Stool Lands Revenue")) +
geom_line(aes(y = Act_Licenses, color = "Licenses"), size = 1) +
geom_point(aes(y = Act_Licenses, color = "Licenses")) +
geom_line(aes(y = Act_Fees, color = "Act Fees"), size = 1) +
geom_point(aes(y = Act_Fees, color = "Act Fees")) +
labs(
title = "Land-Based Revenue Trends Over Years",
x = "Year",
y = "Revenue (Ghana Cedis)",
color = "Revenue Type"
) +
scale_y_continuous(labels = comma) +
scale_color_brewer(palette = "Set1")+
theme(
legend.position = "right",
legend.title = element_text(face = "bold"),
plot.title = element_text(hjust = 0.5, face = "bold")
)
# IGF and Land-Based Revenue Trends
ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
geom_line(aes(y = IGF, color = "IGF"), size = 1) +
geom_point(aes(y = IGF, color = "IGF")) +
geom_line(aes(y = Act_Permit, color = "Permit Fees"), size = 1) +
geom_point(aes(y = Act_Permit, color = "Permit Fees")) +
geom_line(aes(y = Act_Property_Rates, color = "Property Rates"), size = 1) +
geom_point(aes(y = Act_Property_Rates, color = "Property Rates")) +
geom_line(aes(y = Act_Stool_Lands, color = "Stool Lands Revenue"), size = 1) +
geom_point(aes(y = Act_Stool_Lands, color = "Stool Lands Revenue")) +
geom_line(aes(y = Act_Licenses, color = "Licenses"), size = 1) +
geom_point(aes(y = Act_Licenses, color = "Licenses")) +
geom_line(aes(y = Act_Fees, color = "Act Fees"), size = 1) +
geom_point(aes(y = Act_Fees, color = "Act Fees")) +
labs(
title = "IGF vs. Land-Based Revenue Trends Over Years",
x = "Year",
y = "Revenue (Ghana Cedis)",
color = "Revenue Type"
) +
scale_y_continuous(labels = comma) +
scale_color_brewer(palette = "Set1")+
theme(
legend.position = "right",
legend.title = element_text(face = "bold"),
plot.title = element_text(hjust = 0.5, face = "bold")
)
The above shows the trends relationships.
# IGF vs Land-Based Revenues
model_igf_land <- lm(IGF ~ Act_Permit + Act_Property_Rates + Act_Stool_Lands + Act_Licenses + Act_Fees, data = Cleaned_TaMA_Data)
summary(model_igf_land)
##
## Call:
## lm(formula = IGF ~ Act_Permit + Act_Property_Rates + Act_Stool_Lands +
## Act_Licenses + Act_Fees, data = Cleaned_TaMA_Data)
##
## Residuals:
## 1 2 3 4 5 6 7 8 9 10 11
## -40280 -35663 -34672 38886 65915 23974 6768 -1364 16663 -20912 -19317
## attr(,"label")
## [1] "IGF"
## attr(,"format.spss")
## [1] "F8.0"
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 48668.7984 56928.5232 0.855 0.43165
## Act_Permit 1.2123 0.5223 2.321 0.06795 .
## Act_Property_Rates 0.5986 0.1646 3.636 0.01497 *
## Act_Stool_Lands -0.6931 0.8653 -0.801 0.45945
## Act_Licenses 1.9825 0.2128 9.317 0.00024 ***
## Act_Fees 0.5239 0.3852 1.360 0.23191
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 48300 on 5 degrees of freedom
## Multiple R-squared: 0.9975, Adjusted R-squared: 0.9949
## F-statistic: 394.7 on 5 and 5 DF, p-value: 0.000001739
cor_matrix_land_igf <- cor(Cleaned_TaMA_Data[, c("IGF", "Act_Permit", "Act_Property_Rates", "Act_Stool_Lands", "Act_Licenses", "Act_Fees")], use = "complete.obs")
print(cor_matrix_land_igf)
## IGF Act_Permit Act_Property_Rates Act_Stool_Lands
## IGF 1.0000000 0.6437500 0.8995754 0.6988960
## Act_Permit 0.6437500 1.0000000 0.7570547 0.1288664
## Act_Property_Rates 0.8995754 0.7570547 1.0000000 0.4519784
## Act_Stool_Lands 0.6988960 0.1288664 0.4519784 1.0000000
## Act_Licenses 0.9755675 0.4923371 0.7991663 0.7974832
## Act_Fees 0.9394206 0.6205540 0.8356292 0.6528500
## Act_Licenses Act_Fees
## IGF 0.9755675 0.9394206
## Act_Permit 0.4923371 0.6205540
## Act_Property_Rates 0.7991663 0.8356292
## Act_Stool_Lands 0.7974832 0.6528500
## Act_Licenses 1.0000000 0.9066815
## Act_Fees 0.9066815 1.0000000
corrplot(cor_matrix_land_igf)
From the multiple regression results of all the land-based revenues (permit fees, property rates, rents, stool lands revenue, Act fees, licenses) and revenue (IGF) the overall model(p-value: 0.000001739) is statistically significant with a high R-squared of 0.9975, means 99.75% of the variation in the IGF is explained by the land-based revenues (permit fees, property rates, rents, stool lands revenue, fees, licenses). However the individual terms in the model that are significant are property rates and licenses.
The correlation matrix shows that IGF is strongly correlated with Act property Rates, Act fees and licenses.
# Simple linear Regression Analysis
model_permit <- lm(IGF ~ Act_Permit, data = Cleaned_TaMA_Data)
model_property <- lm(IGF ~ Act_Property_Rates, data = Cleaned_TaMA_Data)
model_stool <- lm(IGF ~ Act_Stool_Lands, data = Cleaned_TaMA_Data)
model_license <- lm(IGF ~ Act_Licenses, data = Cleaned_TaMA_Data)
model_acts <- lm(IGF ~ Act_Fees, data = Cleaned_TaMA_Data)
# Visualizations
# Scatter plots (IGF vs each land-based revenue)
ggplot(Cleaned_TaMA_Data, aes(x = Act_Permit, y = IGF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "IGF vs Permit Fees", x = "Permit Fees", y = "IGF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_permit)
##
## Call:
## lm(formula = IGF ~ Act_Permit, data = Cleaned_TaMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -727227 -459803 236089 428312 623386
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 582539.097 509745.401 1.143 0.2826
## Act_Permit 8.791 3.483 2.524 0.0326 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 548000 on 9 degrees of freedom
## Multiple R-squared: 0.4144, Adjusted R-squared: 0.3493
## F-statistic: 6.369 on 1 and 9 DF, p-value: 0.03257
ggplot(Cleaned_TaMA_Data, aes(x = Act_Property_Rates, y = IGF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "IGF vs Property Rates", x = "Property Rates", y = "IGF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_property)
##
## Call:
## lm(formula = IGF ~ Act_Property_Rates, data = Cleaned_TaMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -312851 -249862 46865 161418 558612
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 581390.5708 218542.0645 2.660 0.026032 *
## Act_Property_Rates 2.7118 0.4389 6.179 0.000163 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 312800 on 9 degrees of freedom
## Multiple R-squared: 0.8092, Adjusted R-squared: 0.788
## F-statistic: 38.18 on 1 and 9 DF, p-value: 0.0001629
ggplot(Cleaned_TaMA_Data, aes(x = Act_Stool_Lands, y = IGF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "IGF vs Stool Lands Revenue", x = "Stool Lands Revenue", y = "IGF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_stool)
##
## Call:
## lm(formula = IGF ~ Act_Stool_Lands, data = Cleaned_TaMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -536057 -297926 -29170 74473 989992
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 923587.128 336353.649 2.746 0.0226 *
## Act_Stool_Lands 13.549 4.622 2.932 0.0167 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 512200 on 9 degrees of freedom
## Multiple R-squared: 0.4885, Adjusted R-squared: 0.4316
## F-statistic: 8.594 on 1 and 9 DF, p-value: 0.01672
ggplot(Cleaned_TaMA_Data, aes(x = Act_Licenses, y = IGF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "IGF vs Licenses", x = "Licenses", y = "IGF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_license)
##
## Call:
## lm(formula = IGF ~ Act_Licenses, data = Cleaned_TaMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -170249 -105590 -21287 47670 280311
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 186599.561 130040.291 1.435 0.185
## Act_Licenses 2.664 0.200 13.321 0.000000315 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 157300 on 9 degrees of freedom
## Multiple R-squared: 0.9517, Adjusted R-squared: 0.9464
## F-statistic: 177.5 on 1 and 9 DF, p-value: 0.0000003148
ggplot(Cleaned_TaMA_Data, aes(x = Act_Fees, y = IGF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "IGF vs Act Fees", x = "Act Fees", y = "IGF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_acts)
##
## Call:
## lm(formula = IGF ~ Act_Fees, data = Cleaned_TaMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -358751 -227092 69947 145759 351559
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 14635.4783 229355.6094 0.064 0.951
## Act_Fees 5.8889 0.7162 8.222 0.0000178 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 245500 on 9 degrees of freedom
## Multiple R-squared: 0.8825, Adjusted R-squared: 0.8695
## F-statistic: 67.6 on 1 and 9 DF, p-value: 0.00001777
The simple linear regression analysis of the land-based revenues found all the simple models to be statistically significant and therefore have a strong positive linear relationship with IGF.
# DACF Trend
ggplot(Cleaned_TaMA_Data, aes(x = Year, y = DACF)) +
geom_line(color = "blue", size = 1) +
geom_point(size = 2.5) +
labs(
title = "DACF Trend Over Years",
x = "Year",
y = "DACF (Ghana Cedis)"
) +
scale_y_continuous(labels = comma) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.title = element_text(face = "bold")
)
# Land-Based Revenue Trends
ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
geom_line(aes(y = Act_Permit, color = "Permit Fees"), size = 1) +
geom_line(aes(y = Act_Property_Rates, color = "Property Rates"), size = 1) +
geom_line(aes(y = Act_Stool_Lands, color = "Stool Lands Revenue"), size = 1) +
geom_line(aes(y = Act_Licenses, color = "Licenses"), size = 1) +
geom_line(aes(y = Act_Fees, color = "Act_Fees"), size = 1) +
labs(
title = "Land-Based Revenue Trends Over Years",
x = "Year",
y = "Revenue (Ghana Cedis)",
color = "Revenue Type"
) +
scale_y_continuous(labels = comma) +
theme(
legend.position = "right",
legend.title = element_text(face = "bold"),
plot.title = element_text(hjust = 0.5, face = "bold")
)
#DACF and Land-Based Revenue Trends
ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
geom_line(aes(y = Act_Permit, color = "Permit Fees"), size = 1) +
geom_line(aes(y = Act_Property_Rates, color = "Property Rates"), size = 1) +
geom_line(aes(y = Act_Stool_Lands, color = "Stool Lands Revenue"), size = 1) +
geom_line(aes(y = Act_Licenses, color = "Licenses"), size = 1) +
geom_line(aes(y = Act_Fees, color = "Act_Fees"), size = 1) +
geom_line(aes(y = DACF, color = "DACF"), size = 1) +
labs(
title = "DACF vs.Land-Based Revenue Trends Over Years",
x = "Year",
y = "Revenue (Ghana Cedis)",
color = "Revenue Type"
) +
scale_y_continuous(labels = comma) +
theme(
legend.position = "right",
legend.title = element_text(face = "bold"),
plot.title = element_text(hjust = 0.5, face = "bold")
)
The above shows the trends relationships.
# DACF vs Land-Based Revenues
model_DACF_land <- lm(DACF ~ Act_Permit + Act_Property_Rates + Act_Stool_Lands + Act_Licenses + Act_Fees, data = Cleaned_TaMA_Data)
summary(model_DACF_land)
##
## Call:
## lm(formula = DACF ~ Act_Permit + Act_Property_Rates + Act_Stool_Lands +
## Act_Licenses + Act_Fees, data = Cleaned_TaMA_Data)
##
## Residuals:
## 1 2 3 4 5 6 7 8
## -646337 -561377 -1082402 1795148 636064 1031236 -325750 -278831
## 9 10 11
## -347963 -185386 -34402
## attr(,"label")
## [1] "DACF"
## attr(,"format.spss")
## [1] "F8.0"
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -250260.983 1388185.695 -0.180 0.864
## Act_Permit 4.480 12.736 0.352 0.739
## Act_Property_Rates -1.130 4.014 -0.281 0.790
## Act_Stool_Lands 4.431 21.099 0.210 0.842
## Act_Licenses 2.934 5.189 0.565 0.596
## Act_Fees 3.348 9.393 0.356 0.736
##
## Residual standard error: 1178000 on 5 degrees of freedom
## Multiple R-squared: 0.6401, Adjusted R-squared: 0.2802
## F-statistic: 1.779 on 5 and 5 DF, p-value: 0.2713
cor_matrix_land_DACF <- cor(Cleaned_TaMA_Data[, c("DACF", "Act_Permit", "Act_Property_Rates", "Act_Stool_Lands", "Act_Licenses", "Act_Fees")], use = "complete.obs")
print(cor_matrix_land_DACF)
## DACF Act_Permit Act_Property_Rates Act_Stool_Lands
## DACF 1.0000000 0.4571057 0.6271925 0.6394902
## Act_Permit 0.4571057 1.0000000 0.7570547 0.1288664
## Act_Property_Rates 0.6271925 0.7570547 1.0000000 0.4519784
## Act_Stool_Lands 0.6394902 0.1288664 0.4519784 1.0000000
## Act_Licenses 0.7843147 0.4923371 0.7991663 0.7974832
## Act_Fees 0.7573710 0.6205540 0.8356292 0.6528500
## Act_Licenses Act_Fees
## DACF 0.7843147 0.7573710
## Act_Permit 0.4923371 0.6205540
## Act_Property_Rates 0.7991663 0.8356292
## Act_Stool_Lands 0.7974832 0.6528500
## Act_Licenses 1.0000000 0.9066815
## Act_Fees 0.9066815 1.0000000
corrplot(cor_matrix_land_DACF)
The multiple regression results of all the land-based revenues (permit fees, property rates, rents, stool lands revenue, licenses) and revenue (DACF) is not statistically significant ( p-value: 0.2713) with a R-squared of 0.6401 and Adjusted R-squared of 0.2802 means a poor model and does fit. In terms of individual terms none significant as well.
The correlation matrix shows that DACF is moderately correlated with all the land-based revenues.
# Simple linear Regression Analysis
model_permit <- lm(DACF ~ Act_Permit, data = Cleaned_TaMA_Data)
model_property <- lm(DACF ~ Act_Property_Rates, data = Cleaned_TaMA_Data)
model_stool <- lm(DACF ~ Act_Stool_Lands, data = Cleaned_TaMA_Data)
model_license <- lm(DACF ~ Act_Licenses, data = Cleaned_TaMA_Data)
model_acts <- lm(DACF ~ Act_Fees, data = Cleaned_TaMA_Data)
# Scatter plots (DACF vs each land-based revenue)
ggplot(Cleaned_TaMA_Data, aes(x = Act_Permit, y = DACF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "DACF vs Permit Fees", x = "Permit Fees", y = "DACF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_permit)
##
## Call:
## lm(formula = DACF ~ Act_Permit, data = Cleaned_TaMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1928126 -1065845 529177 738779 1648497
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1173775.071 1210657.536 0.970 0.358
## Act_Permit 12.756 8.273 1.542 0.158
##
## Residual standard error: 1302000 on 9 degrees of freedom
## Multiple R-squared: 0.2089, Adjusted R-squared: 0.1211
## F-statistic: 2.377 on 1 and 9 DF, p-value: 0.1575
ggplot(Cleaned_TaMA_Data, aes(x = Act_Property_Rates, y = DACF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "DACF vs Property Rates", x = "Property Rates", y = "DACF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_property)
##
## Call:
## lm(formula = DACF ~ Act_Property_Rates, data = Cleaned_TaMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1492047 -762705 -158813 810478 1799052
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1204113.767 796358.381 1.512 0.1648
## Act_Property_Rates 3.863 1.599 2.416 0.0389 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1140000 on 9 degrees of freedom
## Multiple R-squared: 0.3934, Adjusted R-squared: 0.326
## F-statistic: 5.836 on 1 and 9 DF, p-value: 0.03888
ggplot(Cleaned_TaMA_Data, aes(x = Act_Stool_Lands, y = DACF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "DACF vs Stool Lands Revenue", x = "Stool Lands Revenue", y = "DACF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_stool)
##
## Call:
## lm(formula = DACF ~ Act_Stool_Lands, data = Cleaned_TaMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1543231 -879120 -118859 706050 1790413
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1301804.43 738801.17 1.762 0.1119
## Act_Stool_Lands 25.33 10.15 2.495 0.0341 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1125000 on 9 degrees of freedom
## Multiple R-squared: 0.4089, Adjusted R-squared: 0.3433
## F-statistic: 6.227 on 1 and 9 DF, p-value: 0.03412
ggplot(Cleaned_TaMA_Data, aes(x = Act_Licenses, y = DACF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "DACF vs Licenses", x = "Licenses", y = "DACF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_license)
##
## Call:
## lm(formula = DACF ~ Act_Licenses, data = Cleaned_TaMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -942517 -597254 -177530 443539 1828251
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 289820.356 750333.986 0.386 0.70828
## Act_Licenses 4.377 1.154 3.793 0.00426 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 907800 on 9 degrees of freedom
## Multiple R-squared: 0.6151, Adjusted R-squared: 0.5724
## F-statistic: 14.39 on 1 and 9 DF, p-value: 0.004264
ggplot(Cleaned_TaMA_Data, aes(x = Act_Fees, y = DACF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "DACF vs Act Fees", x = "Act Fees", y = "DACF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_acts)
##
## Call:
## lm(formula = DACF ~ Act_Fees, data = Cleaned_TaMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1632350 -410338 -469 452383 1898842
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -906.982 892840.595 -0.001 0.99921
## Act_Fees 9.702 2.788 3.480 0.00694 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 955500 on 9 degrees of freedom
## Multiple R-squared: 0.5736, Adjusted R-squared: 0.5262
## F-statistic: 12.11 on 1 and 9 DF, p-value: 0.006943
The simple linear regression analysis of the land-based revenues found all of them models to be significant except permit fees.
# Capital_Expenditure Trend
ggplot(Cleaned_TaMA_Data, aes(x = Year, y = Capital_Expenditure)) +
geom_line(color = "blue", size = 1) +
geom_point(size = 2.5) +
labs(
title = "Capital Expenditure Trend Over Years",
x = "Year",
y = "Capital_Expenditure (Ghana Cedis)"
) +
scale_y_continuous(labels = comma) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.title = element_text(face = "bold")
)
# Land-Based Revenue Trends
ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
geom_line(aes(y = Act_Permit, color = "Permit Fees"), size = 1) +
geom_line(aes(y = Act_Property_Rates, color = "Property Rates"), size = 1) +
geom_line(aes(y = Act_Stool_Lands, color = "Stool Lands Revenue"), size = 1) +
geom_line(aes(y = Act_Licenses, color = "Licenses"), size = 1) +
geom_line(aes(y = Act_Fees, color = "Act_Fees"), size = 1) +
labs(
title = "Land-Based Revenue Trends Over Years",
x = "Year",
y = "Revenue (Ghana Cedis)",
color = "Revenue Type"
) +
scale_y_continuous(labels = comma) +
theme(
legend.position = "right",
legend.title = element_text(face = "bold"),
plot.title = element_text(hjust = 0.5, face = "bold")
)
#Capital_Expenditure and Land-Based Revenue Trends
ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
geom_line(aes(y = Act_Permit, color = "Permit Fees"), size = 1) +
geom_line(aes(y = Act_Property_Rates, color = "Property Rates"), size = 1) +
geom_line(aes(y = Act_Stool_Lands, color = "Stool Lands Revenue"), size = 1) +
geom_line(aes(y = Act_Licenses, color = "Licenses"), size = 1) +
geom_line(aes(y = Act_Fees, color = "Act_Fees"), size = 1) +
geom_line(aes(y = Capital_Expenditure, color = "Capital_Expenditure"), size = 1) +
labs(
title = "Capital Exp. vs.Land-Based Revenue Trends Over Years",
x = "Year",
y = "Revenue (Ghana Cedis)",
color = "Revenue Type"
) +
scale_y_continuous(labels = comma) +
theme(
legend.position = "right",
legend.title = element_text(face = "bold"),
plot.title = element_text(hjust = 0.5, face = "bold")
)
The above shows the trends relationships.
# Capital_Expenditure vs Land-Based Revenues
model_Capital_Expenditure_land <- lm(Capital_Expenditure ~ Act_Permit + Act_Property_Rates + Act_Stool_Lands + Act_Licenses + Act_Fees, data = Cleaned_TaMA_Data)
summary(model_Capital_Expenditure_land)
##
## Call:
## lm(formula = Capital_Expenditure ~ Act_Permit + Act_Property_Rates +
## Act_Stool_Lands + Act_Licenses + Act_Fees, data = Cleaned_TaMA_Data)
##
## Residuals:
## 1 2 3 4 5 6 7 8
## -4261971 -4650178 -1657329 4445257 6060357 2358913 1412538 -544287
## 9 10 11
## 187335 -1508719 -1841916
## attr(,"label")
## [1] "Capital Expenditure"
## attr(,"format.spss")
## [1] "F8.0"
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7892361.4277 5592991.5864 1.411 0.217
## Act_Permit 31.6561 51.3136 0.617 0.564
## Act_Property_Rates 0.4514 16.1743 0.028 0.979
## Act_Stool_Lands 2.6662 85.0093 0.031 0.976
## Act_Licenses -9.1549 20.9064 -0.438 0.680
## Act_Fees -3.4384 37.8449 -0.091 0.931
##
## Residual standard error: 4745000 on 5 degrees of freedom
## Multiple R-squared: 0.2889, Adjusted R-squared: -0.4222
## F-statistic: 0.4063 on 5 and 5 DF, p-value: 0.8273
cor_matrix_land_Capital_Expenditure <- cor(Cleaned_TaMA_Data[, c("Capital_Expenditure", "Act_Permit", "Act_Property_Rates", "Act_Stool_Lands", "Act_Licenses", "Act_Fees")], use = "complete.obs")
print(cor_matrix_land_Capital_Expenditure)
## Capital_Expenditure Act_Permit Act_Property_Rates
## Capital_Expenditure 1.00000000 0.07827347 -0.1998323
## Act_Permit 0.07827347 1.00000000 0.7570547
## Act_Property_Rates -0.19983228 0.75705468 1.0000000
## Act_Stool_Lands -0.43152388 0.12886637 0.4519784
## Act_Licenses -0.42322394 0.49233712 0.7991663
## Act_Fees -0.33027140 0.62055400 0.8356292
## Act_Stool_Lands Act_Licenses Act_Fees
## Capital_Expenditure -0.4315239 -0.4232239 -0.3302714
## Act_Permit 0.1288664 0.4923371 0.6205540
## Act_Property_Rates 0.4519784 0.7991663 0.8356292
## Act_Stool_Lands 1.0000000 0.7974832 0.6528500
## Act_Licenses 0.7974832 1.0000000 0.9066815
## Act_Fees 0.6528500 0.9066815 1.0000000
corrplot(cor_matrix_land_Capital_Expenditure)
The multiple regression results of all the land-based revenues (permit fees, property rates, rents, stool lands revenue, licenses) and revenue (Capital_Expenditure) is not statistically significant with p-value (0.8273), R-squared of 0.2889 and Adjusted R-squared of -0.4222 . The individual terms too not significant
The correlation matrix shows that Capital_Expenditure shows poorly correlated with all the land-based revenues.
# Simple linear Regression Analysis
model_permit <- lm(Capital_Expenditure ~ Act_Permit, data = Cleaned_TaMA_Data)
model_property <- lm(Capital_Expenditure ~ Act_Property_Rates, data = Cleaned_TaMA_Data)
model_stool <- lm(Capital_Expenditure ~ Act_Stool_Lands, data = Cleaned_TaMA_Data)
model_license <- lm(Capital_Expenditure ~ Act_Licenses, data = Cleaned_TaMA_Data)
model_acts <- lm(Capital_Expenditure ~ Act_Fees, data = Cleaned_TaMA_Data)
# Scatter plots (Capital_Expenditure vs each land-based revenue)
ggplot(Cleaned_TaMA_Data, aes(x = Act_Permit, y = Capital_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Capital_Expenditure vs Permit Fees", x = "Permit Fees", y = "Capital_Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_permit)
##
## Call:
## lm(formula = Capital_Expenditure ~ Act_Permit, data = Cleaned_TaMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3437605 -2450679 -1718004 590953 9146140
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5198859.921 3889459.148 1.337 0.214
## Act_Permit 6.261 26.579 0.236 0.819
##
## Residual standard error: 4181000 on 9 degrees of freedom
## Multiple R-squared: 0.006127, Adjusted R-squared: -0.1043
## F-statistic: 0.05548 on 1 and 9 DF, p-value: 0.8191
ggplot(Cleaned_TaMA_Data, aes(x = Act_Property_Rates, y = Capital_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Capital_Expenditure vs Property Rates", x = "Property Rates", y = "Capital_Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_property)
##
## Call:
## lm(formula = Capital_Expenditure ~ Act_Property_Rates, data = Cleaned_TaMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3775445 -2182719 -1101813 -14365 9343331
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7650410.936 2871462.275 2.664 0.0259 *
## Act_Property_Rates -3.528 5.767 -0.612 0.5558
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4110000 on 9 degrees of freedom
## Multiple R-squared: 0.03993, Adjusted R-squared: -0.06674
## F-statistic: 0.3743 on 1 and 9 DF, p-value: 0.5558
ggplot(Cleaned_TaMA_Data, aes(x = Act_Stool_Lands, y = Capital_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Capital_Expenditure vs Stool Lands Revenue", x = "Stool Lands Revenue", y = "Capital_Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_stool)
##
## Call:
## lm(formula = Capital_Expenditure ~ Act_Stool_Lands, data = Cleaned_TaMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4152793 -2401123 -558131 900629 7870730
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9233208.82 2484709.38 3.716 0.0048 **
## Act_Stool_Lands -49.00 34.14 -1.435 0.1851
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3784000 on 9 degrees of freedom
## Multiple R-squared: 0.1862, Adjusted R-squared: 0.09579
## F-statistic: 2.059 on 1 and 9 DF, p-value: 0.1851
ggplot(Cleaned_TaMA_Data, aes(x = Act_Licenses, y = Capital_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Capital_Expenditure vs Licenses", x = "Licenses", y = "Capital_Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_license)
##
## Call:
## lm(formula = Capital_Expenditure ~ Act_Licenses, data = Cleaned_TaMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4851704 -1262903 -618076 117199 7917267
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10163778.878 3140902.133 3.236 0.0102 *
## Act_Licenses -6.770 4.831 -1.401 0.1946
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3800000 on 9 degrees of freedom
## Multiple R-squared: 0.1791, Adjusted R-squared: 0.08791
## F-statistic: 1.964 on 1 and 9 DF, p-value: 0.1946
ggplot(Cleaned_TaMA_Data, aes(x = Act_Fees, y = Capital_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Capital_Expenditure vs Act Fees", x = "Act Fees", y = "Capital_Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_acts)
##
## Call:
## lm(formula = Capital_Expenditure ~ Act_Fees, data = Cleaned_TaMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5155967 -1790378 -698778 222045 8490030
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9740831.34 3699099.26 2.633 0.0272 *
## Act_Fees -12.13 11.55 -1.050 0.3212
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3959000 on 9 degrees of freedom
## Multiple R-squared: 0.1091, Adjusted R-squared: 0.01009
## F-statistic: 1.102 on 1 and 9 DF, p-value: 0.3212
The simple linear regression analysis of the land-based revenues found none to be significant.
# Capital_Expenditure Trend
ggplot(Cleaned_TaMA_Data, aes(x = Year, y = Recrrent_Expenditure)) +
geom_line(color = "blue", size = 1) +
geom_point(size = 2.5) +
labs(
title = "Recurrent Expenditure Trend ",
x = "Year",
y = "Recurrent Expenditure (Ghana Cedis)"
) +
scale_y_continuous(labels = comma) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.title = element_text(face = "bold")
)
# Land-Based Revenue Trends
ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
geom_line(aes(y = Act_Permit, color = "Permit Fees"), size = 1) +
geom_line(aes(y = Act_Property_Rates, color = "Property Rates"), size = 1) +
geom_line(aes(y = Act_Stool_Lands, color = "Stool Lands Revenue"), size = 1) +
geom_line(aes(y = Act_Licenses, color = "Licenses"), size = 1) +
geom_line(aes(y = Act_Fees, color = "Act_Fees"), size = 1) +
labs(
title = "Land-Based Revenue Trend",
x = "Year",
y = "Revenue (Ghana Cedis)",
color = "Revenue Type"
) +
scale_y_continuous(labels = comma) +
theme(
legend.position = "right",
legend.title = element_text(face = "bold"),
plot.title = element_text(hjust = 0.5, face = "bold")
)
#Capital_Expenditure and Land-Based Revenue Trends
ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
geom_line(aes(y = Act_Permit, color = "Permit Fees"), size = 1) +
geom_line(aes(y = Act_Property_Rates, color = "Property Rates"), size = 1) +
geom_line(aes(y = Act_Stool_Lands, color = "Stool Lands Revenue"), size = 1) +
geom_line(aes(y = Act_Licenses, color = "Licenses"), size = 1) +
geom_line(aes(y = Act_Fees, color = "Act_Fees"), size = 1) +
geom_line(aes(y = Recrrent_Expenditure, color = "Recurrent_Expenditure"), size = 1) +
labs(
title = "Recurrent Exp. vs.Land-Based Revenue Trend",
x = "Year",
y = "Revenue (Ghana Cedis)",
color = "Revenue Type"
) +
scale_y_continuous(labels = comma) +
theme(
legend.position = "right",
legend.title = element_text(face = "bold"),
plot.title = element_text(hjust = 0.5, face = "bold")
)
The above shows the trends relationships.
# Capital_Expenditure vs Land-Based Revenues
model_recurrent_Expenditure_land <- lm(Recrrent_Expenditure ~ Act_Permit + Act_Property_Rates + Act_Stool_Lands + Act_Licenses + Act_Fees, data = Cleaned_TaMA_Data)
summary(model_recurrent_Expenditure_land)
##
## Call:
## lm(formula = Recrrent_Expenditure ~ Act_Permit + Act_Property_Rates +
## Act_Stool_Lands + Act_Licenses + Act_Fees, data = Cleaned_TaMA_Data)
##
## Residuals:
## 1 2 3 4 5 6 7 8 9 10
## -278626 619408 -279093 -173649 3595 -401559 238552 251145 552854 -403472
## 11
## -129154
## attr(,"label")
## [1] "Recrrent Expenditure"
## attr(,"format.spss")
## [1] "F8.0"
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 492927.3537 609203.7422 0.809 0.4552
## Act_Permit 2.7058 5.5892 0.484 0.6488
## Act_Property_Rates -0.7208 1.7618 -0.409 0.6994
## Act_Stool_Lands 7.3281 9.2594 0.791 0.4646
## Act_Licenses 6.5748 2.2772 2.887 0.0343 *
## Act_Fees -7.6407 4.1222 -1.854 0.1230
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 516900 on 5 degrees of freedom
## Multiple R-squared: 0.9016, Adjusted R-squared: 0.8032
## F-statistic: 9.164 on 5 and 5 DF, p-value: 0.01479
cor_matrix_land_recurrent_Expenditure <- cor(Cleaned_TaMA_Data[, c("Recrrent_Expenditure", "Act_Permit", "Act_Property_Rates", "Act_Stool_Lands", "Act_Licenses", "Act_Fees")], use = "complete.obs")
print(cor_matrix_land_recurrent_Expenditure)
## Recrrent_Expenditure Act_Permit Act_Property_Rates
## Recrrent_Expenditure 1.0000000 0.2884273 0.5755325
## Act_Permit 0.2884273 1.0000000 0.7570547
## Act_Property_Rates 0.5755325 0.7570547 1.0000000
## Act_Stool_Lands 0.8276869 0.1288664 0.4519784
## Act_Licenses 0.8805044 0.4923371 0.7991663
## Act_Fees 0.6610519 0.6205540 0.8356292
## Act_Stool_Lands Act_Licenses Act_Fees
## Recrrent_Expenditure 0.8276869 0.8805044 0.6610519
## Act_Permit 0.1288664 0.4923371 0.6205540
## Act_Property_Rates 0.4519784 0.7991663 0.8356292
## Act_Stool_Lands 1.0000000 0.7974832 0.6528500
## Act_Licenses 0.7974832 1.0000000 0.9066815
## Act_Fees 0.6528500 0.9066815 1.0000000
corrplot(cor_matrix_land_recurrent_Expenditure)
The multiple regression results of all the land-based revenues (permit fees, property rates, rents, stool lands revenue, fees, licenses) and revenue Recurrent Expenditure has an overall statistically significant with p-value (0.01479), R-squared of 0.9016 and Adjusted R-squared of 0.8032. However all individual terms are non-significant except licenses with p-value (0.0343)
# Simple linear Regression Analysis
model_permit <- lm(Recrrent_Expenditure ~ Act_Permit, data = Cleaned_TaMA_Data)
model_property <- lm(Recrrent_Expenditure ~ Act_Property_Rates, data = Cleaned_TaMA_Data)
model_stool <- lm(Recrrent_Expenditure ~ Act_Stool_Lands, data = Cleaned_TaMA_Data)
model_license <- lm(Recrrent_Expenditure ~ Act_Licenses, data = Cleaned_TaMA_Data)
model_acts <- lm(Recrrent_Expenditure ~ Act_Fees, data = Cleaned_TaMA_Data)
# Scatter plots (Capital_Expenditure vs each land-based revenue)
ggplot(Cleaned_TaMA_Data, aes(x = Act_Permit, y = Recrrent_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Recurrent_Expenditure vs Permit Fees", x = "Permit Fees", y = "Recurrent_Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_permit)
##
## Call:
## lm(formula = Recrrent_Expenditure ~ Act_Permit, data = Cleaned_TaMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1797213 -640446 -367814 970700 1769739
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1746628.168 1093933.811 1.597 0.145
## Act_Permit 6.756 7.476 0.904 0.390
##
## Residual standard error: 1176000 on 9 degrees of freedom
## Multiple R-squared: 0.08319, Adjusted R-squared: -0.01868
## F-statistic: 0.8167 on 1 and 9 DF, p-value: 0.3897
ggplot(Cleaned_TaMA_Data, aes(x = Act_Property_Rates, y = Recrrent_Expenditure))+
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Recurrent_Expenditure vs Property Rates", x = "Property Rates", y = "Recurrent_Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_property)
##
## Call:
## lm(formula = Recrrent_Expenditure ~ Act_Property_Rates, data = Cleaned_TaMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1452006 -680904 -225057 968073 1355844
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1345148.131 701804.221 1.917 0.0875 .
## Act_Property_Rates 2.976 1.409 2.111 0.0639 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1004000 on 9 degrees of freedom
## Multiple R-squared: 0.3312, Adjusted R-squared: 0.2569
## F-statistic: 4.458 on 1 and 9 DF, p-value: 0.06393
ggplot(Cleaned_TaMA_Data, aes(x = Act_Stool_Lands, y = Recrrent_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Recurrent_Expenditure vs Stool Lands Revenue", x = "Stool Lands Revenue", y = "Recurrent_Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_stool)
##
## Call:
## lm(formula = Recrrent_Expenditure ~ Act_Stool_Lands, data = Cleaned_TaMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -909070 -319086 -14245 104187 1674063
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 902608.82 452645.26 1.994 0.07729 .
## Act_Stool_Lands 27.52 6.22 4.425 0.00166 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 689300 on 9 degrees of freedom
## Multiple R-squared: 0.6851, Adjusted R-squared: 0.6501
## F-statistic: 19.58 on 1 and 9 DF, p-value: 0.00166
ggplot(Cleaned_TaMA_Data, aes(x = Act_Licenses, y = Recrrent_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Recurrent_Expenditure vs Licenses", x = "Licenses", y = "Recurrent_Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_license)
##
## Call:
## lm(formula = Recrrent_Expenditure ~ Act_Licenses, data = Cleaned_TaMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -851905 -385766 35648 374513 878344
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 185010.1910 481232.6418 0.384 0.709571
## Act_Licenses 4.1244 0.7402 5.572 0.000346 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 582200 on 9 degrees of freedom
## Multiple R-squared: 0.7753, Adjusted R-squared: 0.7503
## F-statistic: 31.05 on 1 and 9 DF, p-value: 0.0003463
ggplot(Cleaned_TaMA_Data, aes(x = Act_Fees, y = Recrrent_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Recurrent_Expenditure vs Act Fees", x = "Act Fees", y = "Recurrent_Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_acts)
##
## Call:
## lm(formula = Recrrent_Expenditure ~ Act_Fees, data = Cleaned_TaMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1572651 -497381 -98034 418160 1516069
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 527634.468 861117.402 0.613 0.5552
## Act_Fees 7.107 2.689 2.643 0.0268 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 921600 on 9 degrees of freedom
## Multiple R-squared: 0.437, Adjusted R-squared: 0.3744
## F-statistic: 6.985 on 1 and 9 DF, p-value: 0.02678
The simple linear regression analysis of the land-based revenues found Act fees, licenses, and stool lands to be significant but the rest are not.
# Population Trend
ggplot(Cleaned_TaMA_Data, aes(x = Year, y = Population)) +
geom_line(color = "blue", size = 1) +
geom_point(size = 2.5) +
labs(
title = "Population Trend Over Years",
x = "Year",
y = "Population "
) +
scale_y_continuous(labels = comma) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.title = element_text(face = "bold")
)
# Land-Based Revenue Trends
ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
geom_line(aes(y = Act_Permit, color = "Permit Fees"), size = 1) +
geom_line(aes(y = Act_Property_Rates, color = "Property Rates"), size = 1) +
geom_line(aes(y = Act_Stool_Lands, color = "Stool Lands Revenue"), size = 1) +
geom_line(aes(y = Act_Licenses, color = "Licenses"), size = 1) +
geom_line(aes(y = Act_Fees, color = "Act_Fees"), size = 1) +
labs(
title = "Land-Based Revenue Trends Over Years",
x = "Year",
y = "Revenue (Ghana Cedis)",
color = "Revenue Type"
) +
scale_y_continuous(labels = comma) +
theme(
legend.position = "right",
legend.title = element_text(face = "bold"),
plot.title = element_text(hjust = 0.5, face = "bold")
)
#Population and Land-Based Revenue Trends
ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
geom_line(aes(y = Act_Permit, color = "Permit Fees"), size = 1) +
geom_line(aes(y = Act_Property_Rates, color = "Property Rates"), size = 1) +
geom_line(aes(y = Act_Stool_Lands, color = "Stool Lands Revenue"), size = 1) +
geom_line(aes(y = Act_Licenses, color = "Licenses"), size = 1) +
geom_line(aes(y = Act_Fees, color = "Act_Fees"), size = 1) +
geom_line(aes(y = Population, color = "Population"), size = 1) +
labs(
title = "Population vs.Land-Based Revenue Trends Over Years",
x = "Year",
y = "Revenue (Ghana Cedis)",
color = "Revenue Type"
) +
scale_y_continuous(labels = comma) +
theme(
legend.position = "right",
legend.title = element_text(face = "bold"),
plot.title = element_text(hjust = 0.5, face = "bold")
)
The above shows the trends relationships.
# Population vs Land-Based Revenues
model_Population_land <- lm(Population ~ Act_Permit + Act_Property_Rates + Act_Stool_Lands + Act_Licenses + Act_Fees, data = Cleaned_TaMA_Data)
summary(model_Population_land)
##
## Call:
## lm(formula = Population ~ Act_Permit + Act_Property_Rates + Act_Stool_Lands +
## Act_Licenses + Act_Fees, data = Cleaned_TaMA_Data)
##
## Residuals:
## 1 2 3 4 5 6 7 8 9 10 11
## -10371 -13809 -15694 -20345 44737 12166 -1995 7479 13398 -2016 -13548
## attr(,"label")
## [1] "Population"
## attr(,"format.spss")
## [1] "F8.0"
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 281493.92811 31339.85626 8.982 0.000285 ***
## Act_Permit 0.26249 0.28753 0.913 0.403156
## Act_Property_Rates -0.09915 0.09063 -1.094 0.323845
## Act_Stool_Lands 0.37314 0.47634 0.783 0.468885
## Act_Licenses 0.23853 0.11715 2.036 0.097340 .
## Act_Fees 0.36411 0.21206 1.717 0.146624
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 26590 on 5 degrees of freedom
## Multiple R-squared: 0.9631, Adjusted R-squared: 0.9262
## F-statistic: 26.09 on 5 and 5 DF, p-value: 0.001366
cor_matrix_land_Population <- cor(Cleaned_TaMA_Data[, c("Population", "Act_Permit", "Act_Property_Rates", "Act_Stool_Lands", "Act_Licenses", "Act_Fees")], use = "complete.obs")
print(cor_matrix_land_Population)
## Population Act_Permit Act_Property_Rates Act_Stool_Lands
## Population 1.0000000 0.5265332 0.7545774 0.7943878
## Act_Permit 0.5265332 1.0000000 0.7570547 0.1288664
## Act_Property_Rates 0.7545774 0.7570547 1.0000000 0.4519784
## Act_Stool_Lands 0.7943878 0.1288664 0.4519784 1.0000000
## Act_Licenses 0.9616949 0.4923371 0.7991663 0.7974832
## Act_Fees 0.9322003 0.6205540 0.8356292 0.6528500
## Act_Licenses Act_Fees
## Population 0.9616949 0.9322003
## Act_Permit 0.4923371 0.6205540
## Act_Property_Rates 0.7991663 0.8356292
## Act_Stool_Lands 0.7974832 0.6528500
## Act_Licenses 1.0000000 0.9066815
## Act_Fees 0.9066815 1.0000000
corrplot(cor_matrix_land_Population)
The multiple regression results of all the land-based revenues (permit fees, property rates, rents, stool lands revenue, act fees, licenses) and Population overall F-statistic: 26.09 and p-value: 0.001366 is statistically significant with a high R-squared of 0.9631,, and Adjusted R-squared of 0.9262 means a good model fit. However, the individual terms are not significant.
The correlation matrix shows that Population is a moderate to very strong with all the land-based revenues.
# Simple linear Regression Analysis
model_permit <- lm(Population ~ Act_Permit, data = Cleaned_TaMA_Data)
model_property <- lm(Population ~ Act_Property_Rates, data = Cleaned_TaMA_Data)
model_stool <- lm(Population ~ Act_Stool_Lands, data = Cleaned_TaMA_Data)
model_license <- lm(Population ~ Act_Licenses, data = Cleaned_TaMA_Data)
model_acts <- lm(Population ~ Act_Fees, data = Cleaned_TaMA_Data)
# Scatter plots (Population vs each land-based revenue)
ggplot(Cleaned_TaMA_Data, aes(x = Act_Permit, y = Population)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Population vs Permit Fees", x = "Permit Fees", y = "Population") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_permit)
##
## Call:
## lm(formula = Population ~ Act_Permit, data = Cleaned_TaMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -109652 -79286 38029 75939 102059
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 408792.9922 81577.4935 5.011 0.000728 ***
## Act_Permit 1.0358 0.5575 1.858 0.096117 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 87700 on 9 degrees of freedom
## Multiple R-squared: 0.2772, Adjusted R-squared: 0.1969
## F-statistic: 3.452 on 1 and 9 DF, p-value: 0.09612
ggplot(Cleaned_TaMA_Data, aes(x = Act_Property_Rates, y = Population)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Population vs Property Rates", x = "Property Rates", y = "Population") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_property)
##
## Call:
## lm(formula = Population ~ Act_Property_Rates, data = Cleaned_TaMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -69523 -50545 -23316 23172 125095
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 404990.32593 47298.20113 8.562 0.0000128 ***
## Act_Property_Rates 0.32767 0.09498 3.450 0.00728 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 67690 on 9 degrees of freedom
## Multiple R-squared: 0.5694, Adjusted R-squared: 0.5215
## F-statistic: 11.9 on 1 and 9 DF, p-value: 0.007278
ggplot(Cleaned_TaMA_Data, aes(x = Act_Stool_Lands, y = Population)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Population vs Stool Lands Revenue", x = "Stool Lands Revenue", y = "Population") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_stool)
##
## Call:
## lm(formula = Population ~ Act_Stool_Lands, data = Cleaned_TaMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -89162 -36299 486 34026 92323
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 408759.3433 41148.2652 9.934 0.00000378 ***
## Act_Stool_Lands 2.2183 0.5654 3.923 0.00349 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 62660 on 9 degrees of freedom
## Multiple R-squared: 0.6311, Adjusted R-squared: 0.5901
## F-statistic: 15.39 on 1 and 9 DF, p-value: 0.003492
ggplot(Cleaned_TaMA_Data, aes(x = Act_Licenses, y = Population)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Population vs Licenses", x = "Licenses", y = "Population") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_license)
##
## Call:
## lm(formula = Population ~ Act_Licenses, data = Cleaned_TaMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -37911 -20747 -1580 18272 47501
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 323140.48642 23372.58391 13.83 0.000000229 ***
## Act_Licenses 0.37835 0.03595 10.53 0.000002334 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 28280 on 9 degrees of freedom
## Multiple R-squared: 0.9249, Adjusted R-squared: 0.9165
## F-statistic: 110.8 on 1 and 9 DF, p-value: 0.000002334
ggplot(Cleaned_TaMA_Data, aes(x = Act_Fees, y = Population)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Population vs Act Fees", x = "Act Fees", y = "Population") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_acts)
##
## Call:
## lm(formula = Population ~ Act_Fees, data = Cleaned_TaMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -61150 -25894 8218 28052 46155
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 297041.2030 34887.1278 8.514 0.0000134 ***
## Act_Fees 0.8418 0.1089 7.727 0.0000292 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 37340 on 9 degrees of freedom
## Multiple R-squared: 0.869, Adjusted R-squared: 0.8544
## F-statistic: 59.7 on 1 and 9 DF, p-value: 0.00002919
The simple linear regression analysis of the land-based revenues found all of them to be significant except permit fees.
# no variables
# no variables